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The critical need for clean and economical sources of energy 
is transforming data centers that are primarily energy con- 
sumers to also energy producers. We focus on minimizing 
the operating costs of next-generation data centers that can 
jointly optimize the energy supply from on-site generators 
and the power grid, and the energy demand from servers as 
well as power conditioning and cooling systems. We formu- 
late the cost minimization problem and present an offline 
optimal algorithm. For "on-grid" data centers that use only 
the grid, we devise a deterministic online algorithm that 
achieves the best possible competitive ratio of 2 — a a , where 
a a is a normalized look-ahead window size. The competitive 
ratio of an online algorithm is defined as the maximum ratio 
(over all possible inputs) between the algorithm's cost (with 
no or limited look-ahead) and the offline optimal assuming 
complete future information. We remark that the results 
hold as long as the overall energy demand (including server, 
cooling, and power conditioning) is a convex and increasing 
function in the total number of active servers and also in the 
total server load. For "hybrid" data centers that have on-site 
power generation in addition to the grid, we develop an on- 
line algorithm that achieves a competitive ratio of at most 
^"-l [l + 2 
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ized look-ahead window sizes, P max is the maximum grid 
power price, and L, c , and c m are parameters of an on-site 
generator. 

Using extensive workload traces from Akamai with the 
corresponding grid power prices, we simulate our offline and 
online algorithms in a realistic setting. Our offline (resp., 
online) algorithm achieves a cost reduction of 25.8% (resp., 
20.7%) for a hybrid data center and 12.3% (resp., 7.3%) 
for an on-grid data center. The cost reductions are quite 
significant and make a strong case for a joint optimization of 
energy supply and energy demand in a data center. A hybrid 
data center provides about 13% additional cost reduction 
over an on-grid data center representing the additional cost 
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benefits that on-site power generation provides over using 
the grid alone. 

Categories and Subject Descriptors 

F.1.2 [Modes of Computation]: Online computation; G.1.6 
[Optimization]: Nonlinear programming; LI. 2 [Algorithms] 
Analysis of algorithms; 1.2.8 [Problem Solving, Control 
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1. INTRODUCTION 

Internet-scale cloud services that deploy large distributed 
systems of servers around the world are revolutionizing all 
aspects of human activity. The rapid growth of such ser- 
vices has lead to a significant increase in server deployments 
in data centers around the world. Energy consumption of 
data centers account for roughly 1.5% of the global energy 
consumption and is increasing at an alarming rate of about 
15% on an annual basis [21] . The surging global energy de- 
mand relative to its supply has caused the price of electricity 
to rise, even while other operating expenses of a data cen- 
ter such as network bandwidth have decreased precipitously. 
Consequently, the energy costs now represent a large frac- 
tion of the operating expenses of a data center today [9], 
and decreasing the energy expenses has become a central 
concern for data center operators. 

The emergence of energy as a central consideration for 
enterprises that operate large server farms is drastically al- 
tering the traditional boundary between a data center and 
a power utility (c.f. Figure [JJ. Traditionally, a data center 
hosts servers but buys electricity from an utility company 
through the power grid. However, the criticality of the en- 
ergy supply is leading data centers to broaden their role to 
also generate much of the required power on-site, decreasing 
their dependence on a third-party utility. While data centers 
have always had generators as a short-term backup for when 
the grid fails, on-site generators for sustained power supply is 
a newer trend. For instance, Apple recently announced that 
it will build a massive data center for its iCloud services with 
60% of its energy coming from its on-site generators that use 



"clean energy" sources such as fuel cells with biogas and solar 
panels [25]. As another example, eBay recently announced 
that it will add a 6 MW facility to its existing data center in 
Utah that will be largely powered by on-site fuel cell gener- 
ators [17]. The trend for hybrid data centers that generate 
electricity on-site (c.f. Figure [1} with reduced reliance on 
the grid is driven by the confluence of several factors. This 
trend is also mirrored in the broader power industry where 
the centralized model for power generation with few large 
power plants is giving way to a more distributed generation 
model pT] where many smaller on-site generators produce 
power that is consumed locally over a "micro-grid". 

A key factor favoring on-site generation is the potential for 
cheaper power than the grid, especially during peak hours. 
On-site generation also reduces transmission losses that in 
turn reduce the effective cost, because the power is gener- 
ated close to where it is consumed. In addition, another 
factor favoring on-site generation is a requirement for many 
enterprises to use cleaner renewable energy sources, such as 
Apple's mandate to use 100% clean energy in its data cen- 
ters [6]. Such a mandate is more easily achievable with the 
enterprise generating all or most of its power on-site, es- 
pecially since recent advances such as the fuel cell technol- 
ogy of Bloom Energy [Jj make on-site generation economical 
and feasible. Finally, the risk of service outages caused by 
the failure of the grid, as happened recently when thunder- 
storms brought down the grid causing a denial-of-service for 
Amazon's AWS service for several hours [18], has provided 
greater impetus for on-site power generation that can sus- 
tain the data center for extended periods without the grid. 

Our work focuses on the key challenges that arise in the 
emerging hybrid model for a data center that is able to si- 
multaneously optimize both the generation and consumption 
of energy (c.f. Figure [1]). In the traditional scenario, the 
utility is responsible for energy provisioning (EP) that has 
the goal of supplying energy as economically as possible to 
meet the energy demand, albeit the utility has no detailed 
knowledge and no control over the server workloads within 
a data center that drive the consumption of power. Optimal 
energy provisioning by the utility in isolation is characterized 
by the unit commitment problem [31II36| that has been stud- 
ied over the past decades. The energy provisioning problem 
takes as input the demand for electricity from the consumers 
and determines which power generators should be used at 
what time to satisfy the demand in the most economical 
fashion. Further, in a traditional scenario, a data center is 
responsible for capacity provisioning (CP) that has the goal 
of managing its server capacity to serve the incoming work- 
load from end users while reducing the total energy demand 
of servers, as well as power conditioning and various cool- 
ing systems, but without detailed knowledge or control over 
the power generation. For instance, dynamic provisioning of 
server capacity by turning off some servers during periods of 
low workload to reduce the energy demand has been studied 
in recent years [23l [28l [TOl [27] . 

The convergence of power generation and consumption 
within a single data center entity and the increasing impact 
of energy costs requires a new integrated approach to both 
energy provisioning (EP) and capacity provisioning (CP). 
A key contribution of our work is formulating and developing 
algorithms that simultaneously manage on-site power gen- 
eration, grid power consumption, and server capacity with 
the goal of minimizing the operating cost of the data center. 
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Figure 1: While an "on-grid" data center derives all its power 
from the grid, next-generation "hybrid" data centers have 
additional on-site power generation. 

Online vs. Offline Algorithms. In designing algorithms 
for optimizing the operating cost of a hybrid data center, 
there are three time-varying inputs: the server workload 
a(t) generated by service requests from users and the price 
of a unit energy from the grid pit), and the total power 
consumption function g t for each time t where 1 < t < T. 
We begin by investigating offline algorithms that minimize 
the operating cost with perfect knowledge of the entire in- 
put sequence a(t), p(t) and gt, for 1 < t < T. However, 
in real-life, the time-varying input sequences are not know- 
able in advance. In particular, the optimization must be 
performed in an online fashion where decisions at time t 
are made with the knowledge of inputs a(r),p(r) and g T , 
for 1 < t < t + w, where w > is a small (possibly zero) 
look-ahead window. Specifically, an online algorithm has no 
knowledge of inputs beyond the look-ahead window, i. e. , for 
time t + w < r < T. We assume the inputs within the 
look-ahead are perfectly known when analyzing the algo- 
rithm performance. In practice, short-term demand or grid 
price can be estimated rather accurately by various tech- 
niques including pattern analysis and time series analysis 
and prediction |19l 114] . As is typical in the study of online 
algorithms [12] , we seek theoretical guarantees for our online 
algorithms by computing the competitive ratio that is ratio 
of the cost achieved by the online algorithm for an input to 
the optimal cost achieved for the same input by an offline al- 
gorithm. The competitive ratio is computed under a worst 
case scenario where an adversary picks the worst possible 
inputs for the online algorithm. Thus, a small competitive 
ratio provides a strong guarantee that the online algorithm 
will achieve a cost close to the offline optimal even for the 
worst case input. 

Our Contributions. A key contribution of our work is to 
formulate and study data center cost minimization (DCM) 
that integrates energy procurement from the grid, energy 
production using on-site generators, and dynamic server ca- 
pacity management. Our work jointly optimizes the two 
components of DCM: energy provisioning (EP) from the 
grid and generators and capacity provisioning (CP) of the 
servers. 

• We theoretically evaluate the benefit of joint optimiza- 
tion by showing that optimizing energy provisioning 
(EP) and capacity provisioning (CP) separately re- 
sults in a factor loss of optimality p = LP max / (Lc + c m 
compared to optimizing them jointly, where P max is the 
maximum grid power price, and L, c , and c m are the 
capacity, incremental cost, and base cost of an on-site 
generator respectively. Further, we derive an efficient 
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Table 1: Summary of algorithmic results. The on-grid results 
are the best possible for any deterministic online algorithm. 

offline optimal algorithm for hybrid data centers that 
jointly optimize EP and CP to minimize the data cen- 
ter's operating cost. 

• For on-grid data centers, we devise an online deter- 
ministic algorithm that achieves a competitive ratio of 
2 — a s , where a 3 G [0, 1] is the normalized look-ahead 
window size. Further, we show that our algorithm has 
the best competitive ratio of any deterministic online 
algorithm for the problem (c.f. Table[l|. For the more 
complex hybrid data centers, we devise an online de- 
terministic algorithm that achieves a competitive ra- 
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are normalized look-ahead window sizes. Both online 
algorithms perform better as the look-ahead window 
increases, as they are better able to plan their current 
actions based on knowledge of future inputs. Interest- 
ingly, in the on-grid case, we show that there exists 
fixed threshold value for the look-ahead window for 
which the online algorithm matches the offline opti- 
mal in performance achieving a competitive ratio of 
1, i.e., there is no additional benefit gained by the on- 
line algorithm if its look-ahead is increased beyond the 
threshold. 

• Using extensive workload traces from Akamai and the 
corresponding grid prices, we simulate our offline and 
online algorithms in a realistic setting with the goal of 
empirically evaluating their performance. Our offline 
optimal (resp., online) algorithm achieves a cost reduc- 
tion of 25.8% (resp., 20.7%) for a hybrid data center 
and 12.3% (resp., 7.3%) for an on-grid data center. 
The cost reduction is computed in comparison with 
the baseline cost achieved by the current practice of 
statically provisioning the servers and using only the 
power grid. The cost reductions are quite significant 
and make a strong case for utilizing our joint cost op- 
timization framework. Furthermore, our online algo- 
rithms obtain almost the same cost reduction as the 
offline optimal solution even with a small look-ahead 
of 6 hours, indicating the value of short-term predic- 
tion of inputs. 

• A hybrid data center provides about 13% additional 
cost reduction over an on-grid data center representing 
the additional cost benefits that on-site power genera- 
tion provides over using the grid alone. Interestingly, 
it is sufficient to deploy a partial on-site generation 
capacity that provides 60% of the peak power require- 
ments of the data center to obtain over 95% of the 
additional cost reduction. This provides strong moti- 
vation for a traditional on-grid data center to deploy 
at least a partial on-site generation capability to save 
costs. 



2. THE DATA CENTER COST MINIMIZA- 
TION PROBLEM 

We consider the scenario where a data center can jointly 
optimize energy production, procurement, and consumption 
so as to minimize its operating expenses. We refer to this 
data center cost minimization problem as DCM. To study 
DCM, we model how energy is produced using on-site power 
generators, how it can be procured from the power grid, 
and how data center capacity can be provisioned dynami- 
cally in response to workload. While some of these aspects 
have been studied independently, our work is unique in opti- 
mizing these dimensions simultaneously as next-generation 
data centers can. Our algorithms minimize cost by use of 
techniques such as: (i) dynamic capacity provisioning of 
servers - turning off unnecessary servers when workload is 
low to reduce the energy consumption (ii) opportunistic en- 
ergy procurement - opting between the on-site and grid en- 
ergy sources to exploit price fluctuation, and (iii) dynamic 
provisioning of generators - orchestrating which generators 
produce what portion of the energy demand. While prior 
literature has considered these techniques in isolation, we 
show how they can be used in coordination to manage both 
the supply and demand of power to achieve substantial cost 
reduction. 
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Definition 



Number of time slots 

Number of on-site generators 

Switching cost of a server ($) 

Startup cost of an on-site generator ($) 

Sunk cost of maintaining a generator in its 

active state per slot ($) 

Incremental cost for an active generator to 

output an additional unit of energy ($/Wh) 

The maximum output of a generator (Watt) 

Workload at time t 

Price per unit energy drawn from the grid 

at t (Pmin < P(t) < Pmax) ($/Wh) 

Number of active servers at t 

Total server service capability at t 

Grid power used at t (Watt) 

Number of active on-site generators at t 

Total power output from active generators 

at t (Watt) 

Total power consumption as a function of 
x(t) and a(t) at t (Watt) 



Note: we use bold symbols to denote vectors, e.g. 
(x(t)). Brackets indicate the unit. 

Table 2: Key notation. 



2.1 Model Assumptions 

We adopt a discrete-time model whose time slot matches 
the timescale at which the scheduling decisions can be up- 
dated. Without loss of generality, we assume there are to- 
tally T slots, and each has a unit length. 

Workload model. Similar to existing work [131 1341 116| , 
we consider a "mice" type of workload for the data cen- 
ter where each job has a small transaction size and short 
duration. Jobs arriving in a slot get served in the same 
slot. Workload can be split among active servers at arbi- 
trary granularity like a fluid. These assumptions model a 
"request-response" type of workload that characterizes serv- 



ing web content or hosted application services that entail 
short but real-time interactions between the user and the 
server. The workload to be served at time t is represented 
by a(t). Note that we do not rely on any specific stochastic 
model of a(t). 

Server model. We assume that the data center consists 
of a sufficient number of homogeneous servers, and each has 
unit service capacity, i.e., it can serve at most one unit work- 
load per slot, and the same power consumption model. Let 
x(t) be the number of active servers and s(t) £ [0, x(t)] be 
the total server service capability at time t. It is clear that 
s(t) should be larger than a(t) to get the workload served 
in the same slot. We model the aggregate server power con- 
sumption as b(t) = f s (x(t), s(t)), an increasing and convex 
function of x(t) and s(t). That is, the first and second or- 
der partial derivatives in x(t) and s(t) are all non- negative. 
Since f 3 (x(t),s(t)) is increasing in s(t), it is optimal to al- 
ways set s(t) = a(t). Thus, we have b(t) — f s (x(t),a(i)) 
and x(t) > a(t). 

This power consumption model is quite general and cap- 
tures many common server models. One example is the 
commonly adopted standard linear model [pj: 

f s (x(t),a(t)) = &idleX{t) + (C pe ak — C id le)a(t), 

where Cidie and c pea k are the power consumed by an server 
at idle and fully utilized state, respectively. Most servers to- 
day consume significant amounts of power even when idle. 
A holy grail for server design is to make them "power pro- 
portional" by making Cidle zero [32] . 

Besides, turning a server on entails switching cost |28| . 
denoted as /3 3 , including the amortized service interruption 
cost, wear-and-tear cost, e.g., component procurement, re- 
placement cost (hard-disks in particular) and risk associated 
with server switching. It is comparable to the energy cost 
of running a server for several hours [23] . 

In addition to servers, power conditioning and cooling 
systems also consume a significant portion of power. The 
thretQ contribute about 94% of overall power consumption 
and their power draw vary drastically with server utilization 
[33| . Thus, it is important to model the power consumed by 
power conditioning and cooling systems. 

Power conditioning system model. Power condition- 
ing system usually includes power distribution units (PDUs) 
and uninterruptible power supplies (UPSs). PDUs trans- 
form the high voltage power distributed throughout the data 
center to voltage levels appropriate for servers. UPSs pro- 
vides temporary power during outage. We model the power 
consumption of this system as fp(b(t)), an increasing and 
convex function of the aggregate server power consumption 
&(*). 

This model is general and one example is a quadratic func- 
tion adopted in a comprehensive study on the data center 
power consumption [33]: fp(b(t)) = Ci + ivib 2 (t), where 
Ci > and 7Ti > are constants depending on specific 
PDUs and UPSs. 

Cooling system model. We model the power consumed 
by the cooling system as f*(b(t)), a time-dependent (e.g., de- 
pends on ambient weather conditions) increasing and convex 
function of b(t). 



1 The other two, networking and lighting, consume little 
power and have less to do with server utilization. Thus, 
we do not model the two in this paper. 



This cooling model captures many common cooling sys- 
tems. According to [24], the power consumption of an out- 
side air cooling system can be modelled as a time-dependent 
cubic function of b(t): ft(b(t)) = K t b 3 (t), where K t > de- 
pends on ambient weather conditions, such as air tempera- 
ture, at time t. According to [33], the power draw of a water 
chiller cooling system can be modelled as a time-dependent 
quadratic function of b(t): /*(&(t)) = Q t b 2 (t) + L t b(t) + C t , 
where Qt,Lt,Ct > depend on outside air and chilled water 
temperature at time t. Note that all we need is /*(fc(i)) is 
increasing and convex in b(t). 

On-site generator model. We assume that the data 
center has N units of homogeneous on-site generators, each 
having an power output capacity L. Similar to generator 
models studied in the unit commitment problem [2D], we 
define a generator startup cost j3 g , which typically involves 
heating up cost, additional maintenance cost due to each 
startup (e.g., fatigue and possible permanent damage re- 
sulted by stresses during startups), c m as the sunk cost of 
maintaining a generator in its active state for a slot, and 
c as the incremental cost for an active generator to output 
an additional unit of energy. Thus, the total cost for y(t) 
active generators that output u(t) units of energy at time t 
is c m y(t) + c a u(t). 

Grid model. The grid supplies energy to the data center 
in an "on-demand" fashion, with time-varying price p(t) per 
unit energy at time t. Thus, the cost of drawing v(t) units 
of energy from the grid at time t is p(t)v(t). Without loss 
of generality, we assume < P m in < p(t) < P m ax- 

To keep the study interesting and practically relevant, we 
make the following assumptions: (i) the server and generator 
turning-on cost are strictly positive, i.e., /3 S > and j3 g > 0. 
(ii) c + c m / L < P max . This ensures that the minimum on- 
site energy price is cheaper than the maximum grid energy 
price. Otherwise, it should be clear that it is optimal to 
always buy energy from the grid, because in that case the 
grid energy is cheaper and incurs no startup costs. 

2.2 Problem Formulation 

Based on the above models, the data center total power 
consumption is the sum of the server, power conditioning 
system and the cooling system power draw, which can be ex- 
pressed as a time-dependent function of b(t) (b(t) = f s (x(t),a(t)) 

)■■ 

&(*) + fp(Ht)) + fMt)) = 9t(x(t),a(t)). 

We remark that gt(x(t),a(t)) is increasing and convex in x(t) 
and a(t). This is because it is the sum of three increasing 
and convex functions. Note that all results we derive in this 
paper apply to any gt(x,a) as long as it is increasing and 
convex in x and a. 

Our objective is to minimize the data center total cost in 
entire horizon [1, T], which is given by 

T 

Cost(x, y, u,v) = {v(t)p(t) + c a u(t) + c m y(t) (1) 

+p.[x(t) - *(* - !)] + + PMt) - v(t - 1)] + } , 

which includes the cost of grid electricity, the running cost 
of on-site generators, and the switching cost of servers and 
on-site generators in the entire horizon [1,T]. Throughout 
this paper, we set initial condition x(0) = y(0) = 0. 

We formally define the data center cost minimization prob- 



lem as a non-linear mixed-integer program, given the work- 
load a(t), the grid price p(t) and the time-dependent func- 
tion g t (x,a), for 1 < t < T, as time- varying inputs. 

min Cost(x, y,u,v) (2) 

x,y,u,v 

s.t. u(t)+v(t)>g t (x(t),a(t)), (3) 

«(*) < £»(*)> (4) 
x(f) > o(<), (5) 

y{t) < n, (6) 

x(0) = j/(0) = 0, (7) 
var x(t),y(t) G N°,u(<),u(t) G Rj, i G [1,T], 

where [-] + = max(0, •), N° and Rj represent the set of non- 
negative integers and real numbers, respectively. 

Constraint Q ensures the total power consumed by the 
data center is jointly supplied by the generators and the grid. 
Constraint ([4]) captures the maximal output of the on-site 
generator. Constraint (O specifies that there are enough 
active servers to serve the workload. Constraint ((6J is gen- 
erator number constraint. Constraint Q is the boundary 
condition. 

Note that this problem is challenging to solve. First, it is a 
non-linear mixed-integer optimization problem. Further, the 
objective function values across different slots are correlated 
via the switching costs fi s {x(t) — x(t — 1)] + and P g [y(i) — 
y(t — 1)] + , and thus cannot be decomposed. Finally, to 
obtain an online solution we do not even know the inputs 
beyond current slot. 

Next, we introduce a proposition to simplify the structure 
of the problem. Note that if (x(t))T =1 and (j/(t)) 4=1 are 
given, the problem in ©-0 reduces to a linear program 
and can be solved independently for each slot. We then 
obtain the following. 

Proposition 1. Given any x(t) and y{t), the u(t) and 
v(i) that minimize the cost in (J2J) with any gt(x,a) that is 
increasing in x and a, are given by: \/t G [1, T], 



«(*) = 



0, ifp(t) < Co, 

min (Ly(t),gt(x(t),a(t))) , otherwise, 



and 



v(t)=g t (x(t),a(t))-u(t). 

Note that u(t),v(t) can be computed using only x(t),y(t) at 
current time t, thus can be determined in an online fashion. 

Intuitively, the above proposition says if the on-site en- 
ergy price c is higher than the grid price p(t), we should 
buy energy from the grid; otherwise, it is the best to buy 
the cheap on-site energy up to its maximum supply L ■ y(t) 
and the rest (if any) from the more expensive grid. With 
the above proposition, we can reduce the non-linear mixed- 
integer program in (J2j)- (JTJ) with variables x, y, u, and v to 
the following integer program with only variables x and y: 

DCM : 

T 

min (y(t),P(t),dt(x(t))) + p s [x(t) - x(t - 1)] + 



+p g [ y (t)-y(t-i)] + } 

s.t. x(t) > a(t), 

var x(t),y(t)£®°,t€[l,T], 



(8) 



where dt(x(t)) = gt(x(t),a(t)), for the ease of presenta- 
tion in later sections, is increasing and convex in x(t) and 
tf> {y(t),p(t),dt(x(t))) replaces the term v(t)p(t) + c u(t) + 
c m y(t) in the original cost function in |(2| and is defined as 



i>{y{t),p{t),d t {x{t))) 
c m y(t)+p(t)d t (x(t)), 
c m y{t) + c a Ly(t) + 
p(t) (d t (x(t)) - Ly(t)) . 
c m y{i) + c d t {x(t)), 



(9) 



if < c , 
if p(t) > Co and 
d t (x(t)) > Ly(t), 
else. 



As a result of the analysis above, it suffices to solve the 
above formulation of DCM with only variables x and y, in 
order to minimize the data center operating cost. 

2.3 An Offline Optimal Algorithm 

We present an offline optimal algorithm for solving prob- 
lem DCM using Dijkstra's shortest path algorithm [15] - 
We construct a graph G = (V, E), where each vertex de- 
noted by the tuple (x,y,t) represents a state of the data 
center where there are x active servers, and y active gener- 
ators at time t. We draw a directed edge from each vertex 
{x(t — l),y(t — 1), t — 1) to each possible vertex (x(t),y(t),t) 
to represent the fact that the data center can transit from 
the first state to the second state. Further, we associate 
the cost of that transition shown below as the weight of the 
edge: 

1> (y(t), P (t),d t (x(t))) + p,[x(t) - x(t - 1)] + 

+PMt) - y(t - • 

Next, we find the minimum weighted path from the initial 
state represented by vertex (0, 0, 0) to the final state repre- 
sented by vertex (0, 0, T+ 1) by running Dijkstra's algorithm 
on graph G. Since the weights represent the transition costs, 
it is clear that finding the minimum weighted path in G is 
equivalent to minimizing the total transitional costs. Thus, 
our offline algorithm provides an optimal solution for prob- 
lem DCM. 

Theorem 1. The algorithm described above finds an op- 
timal solution to problem DCM in time O (M 2 JV 2 T log (MNT)), 
where T is the number of slots, N the number of generators 
and M — maxi< t <T fo(t)]. 

Proof. Since the numbers of active servers and genera- 
tors are at most M and N, respectively, and there are T + 2 
time slots, graph G has O(MNT) vertices and 0(M 2 N 2 T) 
edges. Thus, the run time of Dijkstra's algorithm on graph 
G is O (M 2 N 2 T log (MNT)). □ 

Remark: In practice, the time-varying input sequences 
(p(t), a(t) and gt) may not be available in advance and hence 
it may be difficult to apply the above offline algorithm. How- 
ever, an offline optimal algorithm can serve as a benchmark, 
using which we can evaluate the performance of online algo- 
rithms. 

3. THE BENEFIT OF JOINT OPTIMIZATION 

Data center cost minimization (DCM) entails the joint 
optimization of both server capacity that determines the en- 
ergy demand and on-site power generation that determines 
the energy supply. Now consider the situation where the 
data center optimizes the energy demand and supply sepa- 
rately. 



First, the data center dynamically provisions the server 
capacity according to the grid power price pit). More for- 
mally, it solves the capacity provisioning problem which we 
refer to as CP below. 

T 

CP: min ^ {p(t) • d t (x(t)) + p a [x(t) - x(t - 1)] + } 
*=i 

s.t. x(t) > a(t), 

x(0) = 0, 
var x(t) £ N°, t G [1, T]. 

Solving problem CP yields x. Thus, the total power de- 
mand at time t given x(t) is d t (x(t)). Note that d t (x(t)) is 
not just server power consumption, but also includes con- 
sumption of power conditioning and cooling systems, as de- 
scribed in Sec. 12.21 

Second, the data center minimizes the cost of satisfying 
the power demand due to dt(x(t)), using both the grid and 
the on-site generators. Specifically, it solves the energy pro- 
visioning problem which we refer to as EP below. 

EP : 

T 

min J2 & (V(*).P(*)> *(*(*))) + PMQ ~ V(* - !)] + } 

t=i 

2/(0) = 0, 
var y(i) E N°, t € [1, T]. 

Let (x, y) be the solution obtained by solving CP and 
EP separately in sequence and (x*,y*) be the solution ob- 
tained by solving the joint-optimization DCM. Further, let 
Cdcm(3;, y) be the value of the data center's total cost for 
solution (x, y), including both generator and server costs as 
represented by the objective function ((HJ of problem DCM. 
The additional benefit of joint optimization over optimizing 
independently is simply the relationship between Cdcm {x, y) 
and Cdcm (x* ,y*). It is clear that (x, y) obeys all the con- 
straints of DCM and hence is a feasible solution of DCM. 
Thus, Cdcm (a;*, y*) < CT>cm{x,y). We can measure the 
factor loss in optimality p due to optimizing separately as 
opposed to optimizing jointly on the worst-case input as fol- 
lows: 

a Cdcm(»,2/) 



all inputs Cdcm (x*,y*) ' 

The following theorem characterizes the benefit of joint op- 
timization over optimizing independently. 

Theorem 2. The factor loss in optimality p by solving 
the problem CP and EP in sequence as opposed to optimiz- 
ing jointly is given by p = LPmax/ (Lc + c m ) and it is tight. 

Proof. Refer to Appendix(F] □ 

The above theorem guarantees that for any time dura- 
tion T, any workload a, any grid price p and any func- 
tion gt(x,a) as long as it is increasing and convex in x and 
a, solving problem DCM by first solving CP then solv- 
ing EP in sequence yields a solution that is within a factor 
-tz-Pmax/ (Lc + c m ) of solving DCM directly. Further, the 
ratio is tight in that there exists an input to DCM where the 
ratio Cdcm(», 2/)/Cdcm (x*,y*) equals LP max / (Lc + c m ) . 

The theorem shows in a quantitative way that a larger 
price discrepancy between the maximum grid price and the 
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Note that a s is the normalized look-ahead window size, 
whose r epre sentations are different under the different set- 
tings of [27] and our work. 

Table 3: Comparis on of the algorit hm GCSR proposed in 
this paper, CSR in [27], and LCP in [23]. 



on-site power yields a larger gain by optimizing the energy 
provisioning and capacity provisioning jointly. Over the past 
decade, utilities have been exposing a greater level of grid 
price variation to their customers with mechanisms such as 
time-of-use pricing where grid prices are much more expen- 
sive during peak hours than during the off-peak periods. 
This likely leads to larger price discrepancy between the grid 
and the on-site power. In that case, our result implies that 
a joint optimization of power and server resources is likely 
to yield more benefits to a hybrid data center. 

Besides characterizing the benefit of jointly optimizing 
power and server resources, the decomposition of problem 
DCM into problems CP and EP provides a key approach 
for our online algorithm design. Problem DCM has an 
objective function with mutually-dependent coupled vari- 
ables x and y indicating the server and generator states, re- 
spectively. This coupling (specifically through the function 
ij) (y(t),p(t),d t (x(t))) ) makes it difficult to design provably 
good online algorithms. However, instead of solving problem 
DCM directly, we devise online algorithms to solve prob- 
lems CP that involves only server variable x and EP that 
involves only the generator variables y. Combining the on- 
line algorithms for CP and EP respectively yields the de- 
sired online algorithm for DCM. 

4. ONLINE ALGORITHMS FOR ON- GRID 
DATA CENTERS 

We first develop an online algorithm for DCM for an on- 
grid data center, where there is no on-site power generation, 
a scenario that captures most data centers today. Since on- 
grid data center has no on-site power generation, solving 
DCM for it reduces to solving problem CP described in 
Sec. rj 

Problems of this kind have been studied in the literature 
(see e.g., [23] [27]). The difference of our work from [231127] is 
as follows (also summarized in Table[3|. From the modelling 
aspect, we explicitly take into account power consumption of 
both cooling and power conditioning systems, in addition to 
servers. From the formulation aspect, we are solving a dif- 
ferent optimization problem, i.e., an integer program with 
convex and increasing objective function. From the theoret- 
ical result aspects, we achieve a small competitive ratio of 
2 — q s , which quickly decreases to 1 as look-ahead window 
w increase. 

Recall that CP takes as input the workload a, the grid 
price p and the time-dependent function g t , Vt and out- 




«<(0. 



GCSR 



GCSR 




Figure 2: An example of Figure 3: An example of 
how workload a is decom- di(i) and corresponding solu- 



(w) 



posed into 4 sub-demands. tion obtained by G CSR ; 



puts the number of active servers x. We construct solutions 
to CP in a divide-and-conquer fashion. We will first de- 
compose the demand a into sub-demands and define corre- 
sponding sub-problem for each server, and then solve capac- 
ity provisioning separately for each sub-problem. Note that 
the key is to correctly decompose the demand and define 
the subproblems so that the combined solution is still opti- 
mal. More specifically, we slice the demand as follows: for 
1 < i < M = maxi<i< T \a(t)~\, l<t<T, 

ai(t) = min {1, max {0, a(t) — (i — 1)}} . 

And the corresponding sub-problem CPi is defined as fol- 
lows. 

T 

CPi. min ^2{p(t)-4-Xi(t)+/3 3 [xi(t)-Xi(t-l)] + } 

t=i 

s.t. Xi(t) > ai(t), 

Xi{0) = 0, 
var Xi(t) G {0, 1}, t G [1, T], 

where Xi(t) indicates whether the i-th server is on at time t 
and d\ = d t (i) — d t (i — 1 ) . d\ can be interpreted as the power 
consumption due to the i-th server at t. 

Problem CPi solves the capacity provisioning problem 
with inputs workload a,i , grid price p and d\ . The key reason 
for our decomposition is that CPi is easier to solve, since 
a± take values in [0, 1] and exactly one server is required 
to serve each a,i. Generally speaking, a divide-and-conquer 
manner may suffer from optimality loss. Surprisingly, as the 
following theorem states, the individual optimal solutions 
for problems CPi can be put together to form an optimal 
solution to the original problem CP. Denote CcPi( 

X i ) ELS 

the cost of solution Xi for problem CPi and Ccp(a;) the 
cost of solution x for problem CP. 

Theorem 3. Consider problem CP with any dt(x(t)) — 
gt(x(t),a(t))) that is convex in x(t). Let Xi be an optimal 
solution and x° n an online solution for problem CPi with 
workload a,i, then X/<=i Xi * s an optimal solution for CP 
with workload a. Furthermore, if \/a.i,i, we have CcPi^?™) 
< 7 ■ CcPi(a;0 for a constant 7 > 1, then Ccp(E=i x i") < 
7 ■ C C p(E"i S *)> Va - 

Proof. Refer to Appendix |A1 □ 

Thus, it remains to design algorithms for each CPi. To 
solve CPi in an online fashion one need only orchestrate one 
server to satisfy the workload a,i and minimize the total cost. 
When ai(t) > 0, we must keep the server active to satisfy 
the workload. The challenging part is what we should do if 
the server is already active but ai(t) — 0. Should we turn 
off the server immediately or keep it idling for some time? 



Algorithm 1 GCSRi w) for problem CP; 

1: d = 0,Xi(0) = 

2: at current time t, do 

3: Set t' <- mm{t' e[t,t + w]\C t + Et'= t P( r R > A} 

4: if ai(t) > then 

5: Xi(t) = 1 and d = 

6: else if r' = NULL or 3r G [t, r'], Oi(r) > then 
7: Xi(t) =Xi(t- 1) and & = d + p{t)d\xi(t) 
8: else 

9: Xi(t) = and d = 
10: end if 



Should we distinguish the scenarios when the grid price is 
high versus low? 

Inspired by "ski-rental" [T5] and [27], we solve CPi by 
the following "break-even" idea. During the idle period, i.e., 
di(t) — 0, we accumulate an "idling cost" and when it reaches 
f3 3 , we turn off the server; otherwise, we keep the server 
idling. Specifically, our online algorithm GCSF«i w ' (Gen- 
eralized Collective Server Rental) for CPi has a look-ahead 
window w. At time t, if there exist r' G [t,t + w] such that 
the idling cost till t' is at least j3 B , we turn off the server; 
otherwise, we keep it idling. More formally, we have Algo- 
rithm[T]and its competitive analysis in Theoremf4] A simple 
example of GCSRi w ' ) is shown in Fig. [3] 

Our online algorithm for CP, denoted as GCSFj w ', first 
employs GCSR^ W ' to solve each CPi on workload aj, 1 < 

1 < M, in an online fashion to produce output x° n and 
then simply outputs Ef-li x i" = x ° n as the output for the 
original problem CP. 

Theorem 4. GCSRi w ' achieves a competitive ratio of 

2 — a s for CPi, where a s = min (1, wd m i n fmin//9s) G [0, 1] is 
a "normalized" look- ahead window size anrfd m i n = min t {d t (l) 
— dt(0)}. Hence, according to Theorem\3\ GCSR.' W ' achieves 
the same competitive ratio for CP. Further, no determinis- 
tic online algorithm with a look-ahead window w can achieve 
a smaller competitive ratio. 

Proof. Refer to Appendix [Ul □ 

A consequence of Theorem [4] is that when the look-ahead 
window size w reaches a break-even interval A s = /3 s /(d m infmin), 
our online algorithm has a competitive ratio of 1. That is, 
having a look-ahead window larger than A s will not decrease 
the cost any further. 

5. ONLINE ALGORITHMS FOR HYBRID 
DATA CENTERS 

Unlike on-grid data centers, hybrid data centers have on- 
site power generation and therefore have to solve both ca- 
pacity provisioning (CP) and energy provisioning (EP) to 
solve the data center cost minimization (DCM) problem. 
We design an online algorithm that we call DCMON solv- 
ing DCM as follows. 

1. Run algorithm GCSR from Sec. H]to solve CP that 
takes workload a, grid price p and time-dependent 
function g t , Vf as input and produces the number of 
active servers x on . 

2. Run algorithm CHASE described in Section Ex2"l below 
to solve EP that takes the energy demand d t (x°"(t)) — 



gt{x° n (t),a(t)) and grid price p(t), Vt as input and de- 
cides when to turn on/off on-site generators and how 
much power to draw from the generators and the grid. 
Note that a similar problem has been studied in the 
microgrid scenarios for energy generation scheduling 
in our previous work [26]. In this paper, we adapt al- 
gorithm CHASE developed in [55] to our data center 
scenarios to solve EP in an online fashion. 

For the sake of completeness, we first briefly present the 
design behind CHASE in Sec. 15.11 and the algorithm and 
its intuitions in Sec. 15.21 Then we present the combined 
algorithm DCMON in Sec. [O] 

5.1 A useful structure of an offline optimal so- 
lution ofEP 

We first reveal an elegant structure of an offline optimal 
solution and then exploit this structure in the design of our 
online algorithm CHASE. 

5.1.1 Decompose EP into sub-problems EPiS 

For the ease of presentation, we denote e(t) = d t (x° n (t)). 
Similar as the decomposition of workload when solving CP, 
we decompose the energy demand e into N sub-demands 
and define sub-problem for each generator, then solve energy 
provisioning separately for each sub-problem, where N is 
the number of on-site generators. Specifically, for 1 < i < 
N, l<t<T, 

ei(t) = min {L, max {0, e(t) — (i — 1)L}} . 

The corresponding sub-problem EPi is in the same form as 
EP except that dt(x(t)) is replaced by ei(t) and y(t) is re- 
placed by yi(t) £ {0,1}. Using this decomposition, we can 
solve EP on input e by simultaneously solving simpler prob- 
lems EPi on input e; that only involve a single generator. 
Theorem [5] shows that the decomposition incurs no opti- 
mality loss. Denote CePj (yi) as the cost of solution yi for 
problem EPi and Cep (y) the cost of solution y for problem 
EP. 

Theorem 5. Let y i be an optimal solution and y° n an 
online solution for EPi with energy demand e.%, then^2,^ =l y i 
is an optimal solution for EP with energy demand e. Fur- 
thermore, if Vei,i, we have CEPj(3/° n ) < 7 • Cep^^) for a 
constant-) > 1, then C B p(X)ili yT) < 7 ,c Ep(Z)iIi Wi)> Ve - 

Proof. Refer to Appendix[B] □ 

5.1.2 Solve each sub-problem EPi 

Based on Theorem [5] it remains to design algorithms for 
each EPi. 
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Define 

n(t) = i> (0,p(t), ei(t)) - V (l,p(t), <*(«)) . 



(10) 



ri(t) can be interpreted as the one-slot cost difference be- 
tween not using and using on-site generation. Intuitively, if 
ri(t) > (resp. r\(t) < 0), it will be desirable to turn on 
(resp. off) the generator. However, due to the startup cost, 
we should not turn on and off the generator too frequently. 
Instead, we should evaluate whether the cumulative gain or 
loss in the future can offset the startup cost. This intuition 
motivates us to define the following cumulative cost differ- 
ence Ri(t). We set initial values as Ri(0) — —/3 g and define 
Ri(t) inductively: 
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Figure 4: An example of 
e;(i), Ri(t) and the corre- 
sponding solution obtained 
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Note that Ri(t) is only within the range [— p g , 0]. An impor- 
tant feature of Ri(t) useful later in online algorithm design 
is that it can be computed given the past and current inputs. 
An illustrating example of Ri(t) is shown in Fig. [4] 

Intuitively, when Ri(t) hits its boundary 0, the cost differ- 
ence between not using and using on-site generation within 
a certain period is at least /3 g , which can offset the startup 
cost. Thus, it makes sense to turn on the generator. Simi- 
larly, when Ri(t) hits — p g , it may be better to turn off the 
generator and use the grid. The following theorem formal- 
izes this intuition, and shows an optimal solution y~i(t) for 
problem EPi at the time epoch when Ri(t) hits its boundary 
values — /3 g or 0. 

Theorem 6. There exists an offline optimal solution for 
problem EPi , denoted by y~i(t), 1 < t < T, so that: 

• ifRi{t) = -p g , then y t (t) = 0; 

• if Ri(t) = 0, then = 1. 
Proof. Refer to Appendix [Dl □ 

5.2 Online algorithm CHASE 

Our online algorithm CHASE^' with look-ahead win- 
dow w exploits the insights revealed in Theorem [5] to solve 
EPi. The idea behind CHASEi w) is to track the offline op- 
timal in an online fashion. In particular, at time 0, Ri(0) = 
—fi g and we set yi(t) = 0. We keep tracking the value of 
Ri (t) at every time slot within the look-ahead window. Once 
we observe that Ri(t) hits values — f3 g or 0, we set the yi(t) to 
the optimal solution as Theorem [B] reveals; otherwise, keep 
Vi(t) = Vi(t ~ 1) unchanged. More formally, we have Al- 
gorithm [2] and its competitive analysis in Theorem [7] An 
example of CHASEs™' is shown in Fig. 0] 

The online algorithm for EP, denoted as CHASE (w) , 
first employs CHASEi"' to solve each EP; on energy de- 
mand &i, 1 < i < N, in an online fashion to produce output 
y° n and then simply outputs Vi" as the output for the 

original problem EP. 

Theorem 7. CHASEi w) for problem EP ; with a look- 
ahead window w has a competitive ratio of 

2/3 3 (LP max Lc — c m ) 



1 + 



Ri(t) 



, {0, max {-p g) R,{t - 1) + r t (t)}} , (11) 



Hence, according to Theorem [5] CHASE ( - W ^ achieves the 
same competitive ratio for problem EP. 

Proof. Refer to Appendix[E] □ 



Algorithm 2 CHASEs for problem EP; 

1: at current time t, do 
2: Obtain (iZ»(r))*t" 

3: Set t' <- min{r s[t,t + w]\ Ri(r) = or - /3 9 } 
4: if t = NULL then 

5: w(*) = Wi(*-l) 

6: else if -Ri(r') = then 

7: i«(t) = l 

8: else 

9: W (t) = 

10: end if 



where a s = min (1, w/A e ) G [0, 1] and a g = ^ [w — A s ] + 
€ [0, +oo) are "normalized" look-ahead window sizes. 

Proof. Refer to Appendix lGl □ 

As the look-ahead window size w increases, the compet- 
itive ratio in Theorem [8] decreases to LP m ax/ (Lc + c m ) 
(c.f. Fig. 0, the inherent approximation ratio introduced 
by our offline decomposition approach discussed in Section 
|3l However, the real trace based empirical performance of 
DCMON' w ' without look-ahead is already close to the of- 
fline optimal, i.e., ratio close to 1 (c.f. Fig. 



5.3 Combining GCSR and CHASE 

Our algorithm DCMON (w) for solving problem DCM 
with a look-ahead window of w > 0, i.e., knowing grid prices 
p(r), workload a(r) and the function g T , 1 < r < t + w, 
at time t, first uses GCSR from Sec. [4] to solve problem 
CP and then uses CHASE in Sec. 15.21 to solve problem 
EP. An important observation is that the available look- 
ahead window size for GCSR to solve CP is w, i.e., knows 
p(r), a(r) and g r , 1 < r < t + w, at time t; however, the 
available look-ahead window size for CHASE to solve EP 
is only [w — A s ] + , i.e., knows p(r) and e(r) = d T (x on (r)), 
l<r<t + [w — A s ] + , at time t (A s is the break-even 
interval defined in Sec. [4]). 

This is because at time t, CHASE' W ' knows grid prices 
p(r), workload a(r) and the function g T , 1 < r < t + w. 
However, not all the energy demands (e(r))'1l™ are known 
by CHASE (W) . Because we derive the server state x on by 
solving problem CP using our online algorithm GCSR' w) 
using p(r), a(r), g T , 1 < t < t + w. A key observation is 
that at time t it is not possible to compute x° n for the full 
look-ahead window of t + w, since x on (t + 1), . . . , x on (t + w) 
may depend on inputs p(r), a(r), g T ,r > t + w that our 
algorithm does not yet know. Fortunately, for w > A 3 we 
can determine all x oti (t), 1 < r < t+[w — A s } + given inputs 
within the full look-ahead window. That is, while we knows 
the grid prices p, the workload a and the function gt for 
the full look-ahead window w, the server state x on is known 
only for a smaller window of [w — A a ] + . Thus, the energy 
demand e(r) = d T (a; on (r)) = g T {x on {r), a(r)), 1 < r < 
t + [w - A s } + is available for CHASE (w) at time t. 

Thus, a bound on the competitive ratio of DCMON'™' 
is the product of competitive ratios for GCSR^ and 
CHASE ([w ~ Asl + ) from Theorems H and respectively, 
and the optimality loss ratio LP max / (Lc + c m ) due to the 
offline-decomposition stated in Sec. [3] which is given in the 
following Theorem. 

Theorem 8. DCMON (w) for problem DCM has a com- 
petitive ratio of 
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The ratio is also upper-bounded by 
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6. EMPIRICAL EVALUATION 

We evaluate the performance of our algorithms by simu- 
lations based on real-world traces with the aim of (i) corrob- 
orating the empirical performance of our online algorithms 
under various realistic settings and the impact of having 
look-ahead information, (ii) understanding the benefit of 
opportunistically procuring energy from both on-site gen- 
erators and the grid, as compared to the current practice 
of purchasing from the grid alone, (iii) studying how much 
on-site energy is needed for substantial cost benefits. 

6.1 Parameters and Settings 

Workload trace: We use the workload traces from the Aka- 
mai network [T] [30] that is the currently the world's largest 
content delivery network. The traces measure the workload 
of Akamai servers serving web content to actual end-users. 
Note that our workload is of the "request-and-response" type 
that we model in our paper. We use traces from the Akamai 
servers deployed in the New York and San Jose data centers 
that record the hourly average load served by each deployed 
server over 22 days from Dec. 21, 2008 to Jan. 11, 2009. The 
New York trace represents 2.5K servers that served about 
1.4 x 10 10 requests and 1.7 x 10 13 bytes of content to end- 
users during our measurement period. The San Jose trace 
represents 1.5K servers that served about 5.5 x 10 9 requests 
and 8 x 10 12 bytes of content. We show the workload in Fig. 
[6l in which we normalize the load by the server's service ca- 
pacity. The workload is quite characteristic in that it shows 
daily variations (peak versus off-peak) and weekly variations 
(weekday versus weekend). 

Grid price: We use traces of hourly grid power prices in 
New York [5] and San Jose [5] for the same time period, so 
that it can be matched up with the workload traces (c.f. Fig. 
[6]) . Both workload and grid price traces show strong diurnal 
properties: in the daytime, the workload and the grid price 
are relatively high; at night, on the contrary, both are low. 
This indicates the feasibility of reducing the data center cost 
by using the energy from the on-site generators during the 
daytime and use the grid at night. 

Server model: As mentioned in Sec. [51 we assume the 
data center has a sufficient number of homogeneous servers 
to serve the incoming workload at any given time. Similar 
to a typical setting in [32], we use the standard linear server 
power consumption model. We assume that each server con- 
sumes 0.25KWh power per hour at full capacity and has a 
power proportional factor (PPF=(c pea k-Cidie)/cp ea k) of 0.6, 
which gives us c 4die = O.IA'V^, c peak = 0.25KW. In addi- 
tion, we assume the server switching cost equals the energy 
cost of running a server for 3 hours. If we assume an average 
grid price as the price of energy, we get about /3 S = $0.08. 
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Figure 6: Real-world workload from Akamai and the grid 
power price. 

Cooling and power conditioning system, model: We con- 
sider a water chiller cooling system. According to [5] , during 
this 22-day winter period the average high and low temper- 
atures of New York are 41° F and 29° F, respectively. Those 
of San Jose are 58°F and 41° F, respectively. Without loss 
of generality, we take the high temperature as the daytime 
temperature and the low temperature as the nighttime tem- 
perature. Thus, according to [33], the power consumed by 
water chiller cooling systems of the New York and San Jose 
data centers are about 



ft,Nv(b) = 
and 

flsAb) = 



(0.0416 2 + 0.1446 + 0.047)6 max , at daytime, 
(0.036 2 + 0.1366 + 0.042)6 max , at nighttime, 



(0.066 2 + 0.166 + 0.054)6 max 
(0.041b 2 + 0.1446 + 0.047)6 E 



at daytime, 
at nighttime, 



where 6 max is the maximum server power consumption and 
6 is the server power consumption normalized by 6 max . The 
maximum server power consumption of the New York and 
San Jose data centers are b^L = 2500 x 0.25 = 625KW 
and &fi x = 1500 x 0.25 = 375KW. Besides, the power con- 
sumed by the power conditioning system, including PDUs 
and UPSs, is f p (b) = (0.012& 2 + 0.0466 + 0.056)6 max [33]. 

Generator model: We adopt generators with specifica- 
tions the same as the one in [3]. The maximum output of 
the generator is 60KW, i.e., L — 60KW . The incremental 
cost to generate an additional unit of energy c D is set to be 
$0.08/KWh, which is calculated according to the gas price 
[2] and the generator efficiency [4] . Similar to [37] , we set the 
sunk cost of running the generator for unit time c m = $1.2 
and the startup cost j3 g equivalent to the amortized capital 
cost, which gives j3 g = $24. Besides, we assume the num- 
ber of generators iV = 10, which is enough to satisfy all the 
energy demand for this trace and model we use. 

Cost benchmark: Current data centers usually do not 
use dynamic capacity provisioning and on-site generators. 
Thus, we use the cost incurred by static capacity provision- 
ing with grid power as the benchmark using which we evalu- 
ate the cost reduction due to our algorithms. Static capacity 
provisioning runs a fixed number of servers at all times to 
serve the workload, without dynamically turning on/off the 
servers. For our benchmark, we assume that the data center 
has complete workload information ahead of time and pro- 
visions exactly to satisfy the peak workload and uses only 
grid power. Using such a benchmark gives us a conservative 
evaluation of the cost saving from our algorithms. 

Comparisons of Algorithms: We compare four algorithms: 
our online and offline optimal algorithms in on-grid scenar- 
ios, i.e., GCSR and CPOFF, and hybrid scenarios, i.e., 
DCMON and DCMOFF. 





—DCMOFF 




■•■DCMON 




»CPOFF 




GCSR 







* *■ * *■ * * ' — ' — ' 


T 


04 0.06 0.08 0.1 


0.12 0.1 





-DCMOFF 




■•■DCMON 




*CPOFF 


^^^^^ 

-:. - 


GCSR 
"5 








I*s=*. j 



■JS/KW'h) 



PPF = {c peak - c id i E )/c peak 



(a) Cost Reduction vs. c (b) Cost Reduction vs. PPF 
Figure 7: Variation of cost reduction with model parameters. 

6.2 Impact of Model Parameters on Cost Re- 
duction 

We study the cost reduction provided by our offline and 
online algorithms for both on-grid and hybrid data centers 
using the New York trace unless specified otherwise. We as- 
sume no look-ahead information is available when running 
the online algorithms. We compute the cost reduction (in 
percentage) as compared to the cost benchmark which we 
described earlier. When all parameters take their default 
values, our offline (resp. online) algorithms provide up to 
12.3% (resp., 7.3%) cost reduction for on-grid and 25.8% 
(resp., 20.7%) cost reduction for hybrid data centers (c.f. 
Fig. [7] The default value of c D is $0.08/KWh.). Note that 
the online algorithms provide cost reduction that are 5% 
smaller than offline algorithms on account of their lack of 
knowledge of future inputs. Further, note that cost reduc- 
tion of a hybrid data center is larger than that of a on-grid 
data center, since hybrid data center has the ability to gener- 
ate energy on-site to avoid higher grid prices. Nevertheless, 
the extent of cost reduction in all cases is high providing 
strong evidence for the need to perform energy and server 
capacity optimizations. 

Data centers may deploy different types of servers and 
generators with different model parameters. It is then im- 
portant to understand the impact on cost reduction due to 
these parameters. We first study the impact of varying c 
(c.f. Fig. [7|. For a hybrid data center, as c increases the 
cost of on-site generation increases making it less effective 
for cost reduction (c.f Fig. I7a[) . For the same reason, the 
cost reduction of a hybrid data center tends to that of the 
on-grid data center with increasing c as on-site generation 
becomes less economical. 

We then study the impact of power proportional factor 
(PPF). More specifically, we fix c pea k — 0.25KW , and vary 
PPF from to 1 (c.f. Fig. [7b]). As PPF increases, the 
server idle power decreases, thus dynamic provisioning has 
lesser impact on the cost reduction. This explains why CP 
achieves no cost reduction when PPF=1. Since DCM also 
solves CP problem, its performance degrades with increas- 
ing PPF as well. 

6.3 The Relative Value of Energy versus Ca- 
pacity Provisioning 

In this subsection, we use both New York and San Jose 
traces. For a hybrid data center, we ask which optimiza- 
tion provides a larger cost reduction: energy provisioning 
(EP) or server capacity provisioning (CP) in comparison 
with the joint optimization of doing both (DCM). The cost 
reductions of different optimization are shown in Fig. [8] 

For the New York scenario in Fig. I8al overall, we see that 
EP, CP, and DCM provide cost reductions of 16.3%, 7.3%, 
and 20.7%, respectively. However, note that during the day 
doing EP alone provides almost as much cost reduction as 
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Figure 8: Relative values of CP, EP, and DCM. 



N from to 10 and show the corresponding performances of 
our algorithms. Interestingly, in Fig. I9bl our results show 
that provisioning on-site generators to produce 80% of the 
peak power demand of the data center is sufficient to obtain 
all of the cost reduction benefits. Further, with just 60% 
on-site power generation capacity we can achieve 95% of the 
maximum cost reduction. The intuitive reason is that most 
of time the demands of the data center are significantly lower 
than their peaks. 
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Figure 9: Variation of cost reduction with look-ahead and 
on-site capacity. 

the joint optimization DCM. The reason is that during the 
high traffic hours in the day, solving EP to avoid higher 
grid prices provides a larger benefit than optimizing the en- 
ergy consumption by server shutdown. The opposite is true 
during the night where CP is more critical than EP, since 
minimizing the energy consumption by shutting down idle 
servers yields more benefit. 

For the San Jose scenario in Fig. I8bl overall, EP, CP, 
and DCM provide cost reductions of 6.1%, 19%, and 23.7%, 
respectively. Compared to the New York scenario, the rea- 
son why EP achieves so little cost reduction is that the grid 
power is cheaper and thus on-site generation is not that eco- 
nomical. Meanwhile, CP performs closer to DCM, which 
is because the workload curve is highly skew (shown in Fig. 
I6b|) and dynamic provisioning for the server capacity saves 
a lot of server idling cost as well as cooling and power con- 
ditioning cost. 

In a nutshell, EP favors high grid power price while work- 
load with less regular pattern makes CP more competitive. 

6.4 Benefit of Looking Ahead 

We evaluate the cost reduction benefit of increasing the 
look-ahead window. From Fig. I9al we observe that while the 
performance of our online algorithms are already good when 
there is no look-ahead information, they quickly improve 
to the offline optimal when a small amount of look-ahead, 
e.g., 6 hours, is available, indicating the value of short-term 
prediction of inputs. Note that while the competitive ratio 
analysis in Theorem[8]is for the worst case inputs, our online 
algorithms perform much closer to the offline optimal for 
realistic inputs. 

6.5 How Much On-site Power Production is 
Enough 

Thus far, in our experiments, we assumed that a hybrid 
data center had the ability to supply all its energy from on- 
site power generation (N — 10). However, an important 
question is how much investment should a data center oper- 
ator make in installing on-site generator capacity to obtain 
largest cost reduction. 

More specifically, we vary the number of on-site generators 



7. RELATED WORK 

Our study is among a series of work on dynamic provi- 
sioning in data centers and power systems [381 1221 135] . 

In particular, for the capacity provisioning problem, [23] 
and [27] propose online algorithms with performance guar- 
antee to reduce servers operating cost under convex and lin- 
ear mixed integer optimization scenarios, respectively. Dif- 
ferent from these two, our work designs online algorithm 
under non-linear mixed integer optimization scenario and 
we take into account the operating cost of servers as well 
as power conditioning and cooling systems. [241 [39] also 
model cooling systems, but focus on offline optimization of 
the operating cost. 

Energy provisioning for power systems is characterized by 
unit-commitment problem (UC) [5] [31], including a mixed- 
integer programming approach [29] approach and a stochas- 
tic control approach [3B]. All these approaches assume the 
demand (or its distribution) in the entire horizon is known 
a priori, thus they are applicable only when future input in- 
formation can be predicted with certain level of accuracy. In 
contrast, in this paper we consider an online setting where 
the algorithms may utilize only information in the current 
time slot. 

In addition to the difference of our work and existing 
works in the two problems (i.e., capacity provisioning and 
energy provisioning), our work is also unique in that we 
jointly optimize both problems while existing works focus 
on only one of them. 

8. CONCLUSIONS 

Our work focuses on the cost minimization of data centers 
achieved by jointly optimizing both the supply of energy from 
on-site power generators and the grid, and the demand for 
energy from its deployed servers as well as power condition- 
ing and cooling systems. We show that such an integrated 
approach is not only possible in next-generation data centers 
but also desirable for achieving significant cost reductions. 
Our offline optimal algorithm and our online algorithms with 
provably good competitive ratios provide key ideas on how 
to coordinate energy procurement and production with the 
energy consumption. Our empirical work answers several of 
the important questions relevant to data center operators 
focusing on minimizing their operating costs. We show that 
a hybrid (resp., on-grid) data center can achieve a cost re- 
duction between 20.7% to 25.8% (resp., 7.3% to 12.3%) by 
employing our joint optimization framework. We also show 
that on-site power generation can provide an additional cost 
reduction of about 13%, and that most of the additional 
benefit is obtained by a partial on-site generation capacity 
of 60% of the peak power requirement of the data center. 

This work can be extended in several directions. First, it is 
interesting to study how energy storage devices can be used 



to further reduce the data center operating cost. Second, 
another interesting direction is to generalize our analysis 
to take into account deferable workloads. Third, extension 
from homogeneous servers and generators to heterogeneous 
setting is also of great interest. 
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APPENDIX 

A. PROOF OF THEOREM 3 

First, we show that the combined solution Ei=i ®i is °P~ 
timal to CP. 

Denote Ccp(a;) to be cost of CP of solution x. Suppose 
that x is an optimal solution for CP. We will show that we 
can construct a new feasible solution Ef=i f° r CP, and 
a new feasible solution Xi for each CPi, such that 

M AI T 

Ccp(i) = Ccp(53«0 = E C cp,(*0 + 5Zp(*)*(0). 

i=l i—1 t—1 

(13) 

Xi is an optimal solution for each CPi. Hence, CcPi (xi) > 
CcPi(a;i) for each i. Thus, 

M T 
Ccp(i) = ^2C CPi (£i) + ^2p(t)dt(0) 



M 



T 



> ^C C p,(a30 + ^pWd t (0). (14) 



Besides, we also can prove that 



^2Cc Pi (xi)+^2 P (t)dt(0) > Ccp(E**)- (15) 



Hence, Ccp(S) = Ccp(Ei=i x *)i l - e -> Ei=i x * is an °P ti_ 

mal solution for CP. 

Then, we show C C p(E£[i x° n ) < 7 • C C p(Efci 
Because CcPi(a;° n ) < 7 • CcP;(:Ei) and Xi is for CPi, we 

have 7 > 1. According to Eqn. (|14|l . we obtain 



Proof. Define cc, based on x by: 

1, if i<£(t) 



Xi(t) = 



0, otherwise. 



It is straightforward to see that 

M 

£(t) = E*<(*) 

i=l 

and is a feasible solution for CPi, i.e., Xi > at. 

So we have Ccp(i) = Ccp(E;=i x *)- 
Note that aii(t) > ... > x~Ai(t) is a decreasing sequence. 
Because S?i(t) £ {0, 1}, Vi,t, we obtain 



X>i(*) 1)] 



o, if E"i«i(*)<E"i*i(*-i) 

Ei=i - Ei=i ^»(* ~ 1); otherwise 

M A/ 



and 



i(t) 



^4'ii(t) + 4(0) = E d '-i + rf '(°) 



s(t) 



= E[ dt W _d *( i_1 )] +<it ( ) 

!=1 

= d t {x(t)) -dt(0) +d t (0) 

AI 

= dt(5(*)) = *(E **(*))• (1 § ) 



By Eqns. (|T?) and (fT8)l . 



.1/ 



c C p(E*o = E Ccp i( i ')+EpW*(°)- 

i=i i=i t=i 

This completes the proof of this lemma. □ 

Lemma 2. Efai Cop t (sBi)+Ef=iP(*)4(0) > C C p(Efei 
where a;; is any feasible solution for problem CPi. 

Proof. First, it is straightforward that 

M M M 

E^(*)-^(t-i)] + > [e^(*)-E^ (*-i)] ■ (1 9 ) 



7-c C p(s) > E^^n+E^Kco). 

i=i t=i 

Besides, we also can prove that 

AI T M 

ECc Pl K n )+Ep(*) d *(°) ^ CcpCE 3 '"")- (16) 

i = l t — 1 i — 1 

Hence, C C p(Efci af) < 7 • C C p(Efei x 0- 
It remains to prove Eqns. ()13p , (|15p and (fJBJ), which we 
show in Lemmas Q] and 

Lemma 1. Ccp(S) = C C p(E"i *0 = Ef=i Ccp,(*i) + 

ELiP(*)*(«)- 



Denote x(t) = E,^! Tnen . Vt, 



x(t) 



E«c-*i(*)+*(o) > E d '+ dt (°) 

i=l i=l 

= - dt(0) + dt(0) 

A I 

= d t (x(f)) = d t (E^W), (20) 

i-i 

where the first inequality comes from Xi(t) £ {0, 1} and 
dl <df < ■■■ <df. This is because d\ = d t (i) - dt(i - 1) 
and dt(a;) is convex in x. 

This lemma follows from Eqns. (fT9|) and (f20l) . □ 



B. PROOF OF THEOREM 5 

First, we show that the combined solution E«=i J/; is op- 
timal to EP. 

Denote C E p(y) to be cost of EP of solution y. Suppose 
that y is an optimal solution for EP. We will show that we 
can construct a new feasible solution EiLi 2?i f° r EP, and 
a new feasible solution y i for each EPi, such that 

JV JV T 

C EP (y) = Cep(E & ) = J2 <Wfc)+X>(*) Ht) - NL] + 



and 



i=l i=l 



(21) 

y i is an optimal solution for each EPi. Hence, C E p;(?/ 4 ) > 
Cep^S;) for each i. Thus, 

JV T 

C E p(y) = ^C E p I (y l )+^ P (t)[eW-iVL] + 



i=l 
JV 



f = 1 
T 



> E C ^P. (Vi) + £p(*) t e W - ^ ' ( 22 ) 



Besides, we also can prove that 



E CBp 1 (^)+E^*)[ e (*)- iVL ] + ^ CEp (E^)- (23) 



Hence, C E p(y) = C E p(E^Li Vi), l - e ; Efci Vi is an °P ti - 

mal solution for EP. 

Then, we show Cep(E," i vT) < 7 ■ Cep(E«==i Vi)- 
Because Cep^J/"™) < 7 ■ CEP i (y i ) and y i is optimal for 

EPi, we have 7 > 1. According to Eqn. (|22|l . we have 

JV T 

7-c EP (y) > E c ^ i (i/r)+Ef( t )[ e (*)- iVL ] + - 

i—1 t=l 

Besides, we also can prove that 

JV T N 

E c ^M n ) + EpW i e W - jVi l + ^ c ^p(E vD- 

i—1 t=l i=l 

(24) 

Hence, Cep(E£i Vf) < 7 ■ C EP (E l =i Si)- 
It remains to prove Eqn. (|21|l . (|23|l and (|24p . which we 
show in Lemmas [3] and |4] 

Lemma 3. C EP (y) = C EP (Et 1 y i ) = Eti CWfc) + 
ELiPM [e(t)-NL] + . 



Proof. Define y i based on y by: 

I 0, otherwise. 
It is straightforward to see that 

TV 

&(*) = £>(*)■ 

i=l 

So we have C EP (y) = Cep(E!Li Vi)- 
According to EP, 

JV T f / N \ 

CEP(Ewi) = E U £>(*).?(*).«#) 

i—1 t=l { \i=l / 

N JV 

+/3 s E^w-E^(*- 1 )] H 



(25) 



(26) 



i=l i=l 



JV T ( N 

i = l 4 = 1 I i=l 



Note that yi(i) > ... > 2/jv(i) is a decreasing sequence. 
Because yi{t) £ {0, 1}, Vi,t, we obtain 



YlMv-m-i)]" 



o, if EiLifc(*)<Eilifit(*-i) 

Ei=i &(*) ~ T,i=i ~ !). otherwise 



JV iV 

[Z>w-Z>(*-i)]' 



(27) 



i=i i=i 



Also, according to Eqn. (y(t),p(t),e(t)) can be 

rewritten as: 

7A(y(t),p(t),e(t)) 

c m y(t) + p(t)e(t), ifp(t) < Co, 

c m y(t) +p(t)e(t)+ else. (28) 

[Co -p(t)]min{e(t),Ly(t)} 

Next, we distinguish two cases: 

Case J: e(t) < JVL. In this case, Ej=i e * W = e (*) an d 
[e(t) — jVL] + = 0. According to the definition of ei(£), de- 
noting JV = [e(t)/L\ < JV, we have 

( L, if i < JV, 

e*(i) = < e(t) - JVL, if i = TV + 1, 
I 0, else. 

Because yi(t) > ... > yjv(i) is a decreasing sequence and 
Vi(t) G {0, 1}, Vi, we have 



£min{ ei (*),£&(*)} = 



^E<=ift(*), ifEf=i^W<^ 

e(i) else. 

JV 

min{e(t),LE 



Thus, by Eqn. (|28[), we have 

/ JV \ JV 

(Ej/i(*).p(*). e (*)) = (&(*)>?(*)> «*(*)) 

\i=l / i = l 

+p(t) [e(t) - JVL] + . (29) 

Case 2: e(t) > JVL. In this case, ei(t) = L, Vi € [1,JV], 
we have 

JV JV JV 

E mm{ ei (i), Lyi(t)} = L^2yi(t) = min{e(i), L E &(*)}■ 

i—1 i—1 i—1 

Thus, by Eqn. J28J), we have 

/ N \ JV 

V> £&(*),?(*), = I3^(i/i(*).p(*).ci(*)) 

\i=l / 1=1 

+p(t) [e(t) - JVL]+ . (30) 



By Eqns. ((27]), and {3DJ>, we have C B p(E£Li&) = 

This completes the proof of this lemma. □ 

Lemma 4. Eti Cep, (»,) + ELi p(*) I e W " W > 
CEp(Ei=i X/i)i where 2/ ; is any feasible solution for problem 
EPi 

Proof. First, it is straightforward that 

JV N JV 

%— 1 i— 1 i— 1 

Then by Eqn. (J2HJ and the fact that Y^L\ e i(t) = min{e(t), 
and 

JV JV JV 

Emia{ ei (t),L yi (t)} < min{E e *W, ^E «*(*)> 

i — 1 i— 1 i — 1 

JV 

< min{e(t),LEw(*)}. 



Algorithm 3 An offline optimal Algorithm CPOFF s for 
CP; 

1: According to cii, find 7 S , 7 e and all the 7i and J2. 

2: During 7 S and I e , set = 0. 

3: During each I2, set Xi — 1. 

4: During each 7t, 

5: if £ t£Jl P(*K > ft then 

6: set Xi(r) = 0,Vt € h. 

7: else 

8: set Xi(r) = l,Vr 6 Ii. 

9: end if 



we have 

/ JV \ JV 

\i=l / i=l 

+p(t)[e(t)-JVL]+. (32) 
This lemma follows from Eqns. ([31]) and (1321. □ 



C. PROOF OF THEOREM 4 

First, we will characterize an offline optimal algorithm for 

CP;. 

Then, based on the optimal algorithm, we prove the com- 
petitive ratio of our future-aware online algorithm GCSR S W ' . 

Finally, we prove the lower bound of competitive ratio of 
any deterministic online algorithm. 

In CPi, the workload input di takes value in [0, 1] and 
exactly one server is required to serve each a^. When di(t) > 
0, we must keep Xi (t) — 1 to satisfy the feasibility condition. 
The problem is what we should do if the server is already 
active but there is no workload, i.e., m(t) = 0. 

To illustrate the problem better, we define idling interval 
I\ as follows: I\ = [tx,*2], such that (i) dj(ii — 1) > 0; 
(ii) ai(t 2 + 1) > 0; (hi) Vr G [ti,t 2 ], Oi(r) = 0. Similarly, 
define the working interval I2: I2 — [tijia], such that (i) 
ai(h-l) = 0; (ii) Oi(ta + l) = 0; (iii) Vr G [ti,t 2 ], a,(r) > 0. 
Define the starting interval Is: I s = [0, ia], such that (i) 
a»(t2 + 1) > 0; (ii) Vr G [0,^2], Qi (x) = 0. Define the ending 
interval I e : I e = [ti,T + 1], such that (i) di(ti — 1) > 0; (ii) 
Vr G [tx,T + i\, Oi(V) = 0. 

Based on the above definitions, we have the following of- 
fline optimal algorithm CPOFF s for problem CPi. 

LEMMA 5. CPOFF s is an offline optimal algorithm to 
problem CPi. 

Proof. It is easy to see that it is optimal to set Xi = 
during I s and I e and set Xi = 1 during each I 2 - 

During an 7i, an offline optimal solution must set either 
Xi(r) — or Xi(r) = l,Vr G h; otherwise, it will incur 
unnecessary switching cost and can not be optimal. The 
cost of setting Xi = 1 during an I\ is Ete/i ^tP(^) - ^ ne 
cost of setting xi — during I\ is /3 S , because we must pay 



a turn-on cost /5 S after this I\. Thus the above algorithm 
CPOFF s is an offline optimal algorithm to CPi. □ 

Lemma 6. GCSRi w ' ) is (2 — a s )- competitive for problem 
CPi, where a s = min (1, wd m i n P m i n //3 s ) G [0, 1] and d m i n = 
min t {d t (l) -dt(0)} > 0. 

Proof. We compare our online algorithm GCSP«i w ' and 
the offline optimal algorithm CPOFF s described above for 
problem CPi and prove the competitive ratio. Let x° n and 
Xi be the solutions obtained by GCSR s w) and CPOFF s 
for problem CPi, respectively. 

Since dt(x(t)) is increasing and convex in x(t) , we have 

<4 = d t (i) -d t (i- 1) 
> dt(i-l)-dt(i-2) 



> d*(l)-d t (0) 

> min{dt(l)-dt(0)} 



dmin > 0. 



(33) 



It is easy to see that during I s and I2, GCSR S W ' and 
CPOFF s have the same actions. Since the adversary can 
choose the T to be large enough, we can omit the cost in- 
curred during I e when doing competitive analysis. Thus, we 
only need to consider the cost incurred by the GCSR S W ' 
and CPOFFs during each I\. Notice that at the beginning 
of an I2, both algorithm may incur switching cost. However, 
there must be an 7i before an JV So this switching cost will 
be taken into account when we analyze the cost incurred 
during 7i. More formally, for a certain 7i, denoted as [tijta], 



Costi 1 (xi) 



*2 + l 



EK*K (a*(t)-M*)U+& E [xt(t)-Xi(t-l)] + 



«2+l 



= e piWtXiit) + ft E [^'W - x ^ - W 



(34) 



GCSR S W ' performs as follows: it accumulates an "idling 
cost" and when it reaches /3 S , it turns off the server; other- 
wise, it keeps the server idle. Specifically, at time t, if there 
exists r G [t, t + w] such that the idling cost till r is at least 
p s , it turns off the server; otherwise, it keeps it idle. We 
distinguish two cases: 

Case 1: w > /3 s /(d m inP m i n ). In this case, GCSR S W ' per- 
forms the same as CPOFF s . Because 

If Etg/! rf *PW > ft . CPOFFs turns off the server at the 
beginning of the 7i, i.e., at ti. Since w > /3 s /(d m i n Pmin) and 



d\ > dmin according to Eqn. (|33[) , at ti GCSRi™' can find 
a t G [ti,ti + w] such that the idling cost till r is at least 
/3 S , as a consequence of which it also turns off the server at 
the beginning of the I\ . Both algorithms turn on the server 
at the beginning of the following 72. Thus, we obtain 



Costi 1 (x° n ) = Costi 1 (x i ) = /3 S 



(35) 



If J2teii < /9 s ,CPOFF s keeps the server idling dur- 

ing the whole I\. GCSR. S W ' finds that the accumulate idling 
cost till the end of the I\ will not reach f3 s , so it also keeps 
the server idling during the whole 7i. Thus, we have 

Costi 1 (x° n ) = Costi 1 (x i ) = d\p{t). 

teii 

Case 2: w < /3 s /(d m i n Pmin)- In this case, to beat GCSR S W ' , 
the adversary will choose p{t), a,i(t) and d\ so that GCSR S W ' 
will keep the server idling for some time and then turn it 
off, but CPOFF s will turn off the server at the beginning 
of the 7i. Suppose GCSR S W ' keeps the server idling for 8 
slots given no workload within the look-ahead window and 
then turn it off. Then according to Algorithm [1] we must 
nave ^2s+ w d\p(t) < j3 a and Y^s +w+1 d lp(t) > p s . In this 
case, Costi 1 (xi) = /3 a and 

Coat^xD = £)4p(*)+A 
s 

< f3 s — d min P m i n w + p s 

ti P ■ 
= Ps(2 w) 



So 



CcPif^D < Cost^jxT) 
CcPi(*i) ~~ Costi^Xi) 

- ^ ^min-fmin 

* 2 



Combining the above two cases establishes this lemma. 
Furthermore, we have some important observations on 
x° n and Xi, which will be used in later proofs. 



[*r(*) - xr(t - 1)] + = m) - - • (36) 



This is because during an 7i with Yltei dlp(t) > Psi x ° n 
keeps the server idling for some time and then turn it off. 
Xi turns off the server at the beginning of the 7i. Both x° n 
and Xi turn on the server at the beginning of the following 
72. During an 7i with X^te/i ^tP(^) ^ P s ' both x° n and Xi 
keep the server idling till the following 72. Thus, x° n and 
Xi incur the same server switching cost. Besides, in both 
above cases, x° n (t) is no less than Xi(t), we have 



We also observe that 

f>jp(i) (aTW - M*)l) 



r 



r 

(1 - a s )^2[xi(t) -Xi(t- 1) 



(38) 



By rearranging the terms, we obtain 

T T 

J2 d\p{t) {x° n {t) - Si(t)) < (1-Q S ) [**(*) ■ 
t=i t=i 

Notice that X/t=i d\p(t) (xi(t) — \a,i(t)~\) can be seen as the 
total server idling cost incurred by solution Xi. Since idling 
only happens in 7i, Eqn. ()38l) follows from the cases dis- 
cussed above. □ 

Lemma 7. (2 — o s ) is the lower bound of competitive ratio 
of any deterministic online algorithm for problem CPi and 
also CP, where a a = min (1, u)d m in7 J m in//3s) G [0, 1]. 

Proof. First, we show this lemma holds for problem 
CPi. We distinguish two cases: 

Case 1: w > /3 s /(d m inPmm). In this case, (2 — a s ) — 1, 
which is clearly the lower bound of competitive ratio of any 
online algorithm. 

Case 2: w < /3 S / (dmmPmin) ■ Similar as the proof of 
Lemma [6] we only need to analyze behaviors of online and 
offline algorithms during an idle interval 7i. 

Consider the input: d\ — d m in andp(t) = P m in,Vt G 
Under this input, during an 7i, we only need to consider 
a set of deterministic online algorithms with the following 
behavior: either keep the server idling for the whole 7i or 
keep it idling for some slots and then turn if off until the 
end of the 7i. The reason is that any deterministic online 
algorithm not belonging to this set will turn off the server 
at some time and turn on the server before the end of 7i, 
and thus there must be an online algorithm incurring less 
cost by turning off the server at the same time but turning 
on the server at the end of 7i. 

We characterize an algorithm ALG belonging to this set 
by a parameter 5, denoting the time it keeps the server idling 
for given ai = within the lookahead window. Denote the 
solutions of algorithms ALG and CPOFF s for problem 
CPi to be a;"' 9 and Xi, respectively. 

If 5 is infinite, the competitive ratio is apparently infinite 
due to the fact that the adversary can construct an 7i whose 
duration is infinite. Thus we only consider those algorithms 
with finite S. The adversary will construct inputs as follows: 

If 6 + w > /3s /(dminPmin), the adversary will construct an 
7i whose duration is longer than S + w. In this case, ALG 
will keep server idling for 5 slots and then turn if off while 
CPOFF s turns off the server at the beginning of the 7i (c.f. 
Fig. ITOa) . Then the ratio is 



CcPj ^) ^min-fmin H - $s H~ ^min-fmin 



CcPi (xi) 



(37) 



> 1 + 
= 2 - 



Ps ~\~ ^min-Pmin 
[fis/(dminPmin) ~ w] dmin-Pmin 
Ps ~\~ rfmin-fmiii 

(to + l) 



If 5 + w < /3 S /(dmin-Pmm)i the adversary will construct 
an Ii whose duration is exactly 8 + w. In this case, ALG 
will keep server idling for 5 slots and then turn if off while 
CPOFFg keeps the server idling during the whole 7~i (c.f. 
Fig. [TOb]) . Then the ratio is 



Ccp,« tg ) 
CcPi(Si) 



y 1 $ ^min-fmin ~t~ fts ~\~ ^min-fmin 
^min-^inin 

(8 + w) + d min P min 

dminPmin(8 + W + I) + /3 S - TOlimin-Pmin 



> 1 + 

= 2 - 



(5 + w + l) 

$s ^*^min-fmin 
fts -J - c/min-fmin 

(u> + l) 



/^s 4" ^minPmin 

When ci m in —5- or /3 S — > oo, we have 

2 _ dm in P min (w + I) _^ g - ^min-PminW 



/3s "T~ C^min-Pmin 



ft 



Combining the above two cases establishes the lower bound 
for problem CPi. 
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(a) 5 + mi > ft/(d mta P min ) (b) 5 + w < ft,/(d mta P mi n) 
Figure 10: Worst case examples. 

For problem CP, consider the case that dt(0) = and 
a(t) £ [0, 1], Vt. In this case, it is straightforward that CPi 
is equivalent to CP. Thus, the lower bound for CPi is also 
a lower bound for CP. □ 

Theorem [4] follows from lemmas [6] and [7] 

D. PROOF OF THEOREM 6 

Instead of proving this theorem directly, we prove a stronger 
theorem that fully characterizes an offline optimal solution. 
Then Theorem[6]follows naturally. An very important struc- 
ture of an offline optimal solution is "critical segments", 
which are constructed according to Ri(t). 

Definition 1. We divide all time intervals in [1,T] into 
disjoint parts called critical segments: 

[i, in pr + 1, r 2 c ], \n + 1, m \n + 1, t\ 

The critical segments are characterized by a set of critical 
points: Tf < T 2 < ... < T fc c . We define each critical point TJ 
along with an auxiliary point TP, such that the pair (Tj, T?) 
satisfy the following conditions: 

(Boundary): Either (i?i(T/) = and Ri(ff) = -(3 g ) 
or (RiCTf) = -Pg and H*(T/) = 0). 

(Interior): -/3 < Ri(r) < for all T/ < r < T>. 

In other words, each pair of (Tj, T?) corresponds to an 
interval where Ri(t) goes from -j3 g to or to -/3 g , without 
reaching the two extreme values inside the interval. For ex- 
ample, (TT.ff) and (T 2 c ,f 2 c ) in Fig. ITT1 are two such pairs, 



while the corresponding critical segments are (Tf, T 2 ) and 
(T 2 C ,T 3 C ). It is straightforward to see that all (T/,T~?) are 
uniquely denned, and hence critical segments are well-defined. 
See Fig. [11] for an example. 




X X 1 i i: t t, % x r 

Figure 11: An example of critical segments. 

Once the time horizon [1,T] is divided into critical seg- 
ments, we can now characterize the optimal solution. 

Definition 2. We classify the type of a critical segment by: 
Type-start (also call type-0): [1, Tf] 

Type-1: [T/ + 1, T/ +1 ], if ifc(T/) = -fi g and iii(T/ +1 ) = 
Type-2: [T/ + 1, Tj +1 ], if Ri(T!f) = and i?;(T/ +1 ) = -fl, 
Type-end (also call type-,?): [T£ + 1, T] 
For completeness, we also let Tg = and T£ +1 = T. 
Then the following theorem characterizes an offline opti- 
mal solution. 

Theorem 9. An optimal solution for EPi is given by 
0, ifte [Tj + 1, Tj +1 ] is type-start/-2/-end, 



yoFA(t) = 



1, if te [If + l,Tf +1 ] istype-l. 



(40) 



Theorem [6] follows from Theorem [9] and Definition [2] Thus, 
it remains to prove Theorem [5] 

D. 1 Proof of Theorem E 

Before we prove the theorem, we introduce a lemma. 
We define the cost with regard to a segment j by: 



a 



i(y) 



^ ^ ^(i/(t), P (*),ci(t))+ E ft-b/(*)-v(*-i)r 



and define a subproblem for critical segment j by: 

s.t. 1/(2?) =yj, y(25+i + l) =y, r , 
vary(i) G {0, 1}, t 6 [T/ + 1, T? +1 ]. 

Note that due to the startup cost across segment bound- 
aries, in general CEPi ^ C Ep sg-j (y). In other words, 
we should not expect that putting together the solutions to 
each segment will lead to an overall offline optimal solution. 
However, the following lemma shows an important struc- 
ture property that one optimal solution of EP? g_J (m ■, y T j ) 
is independent of boundary conditions (y l j,y T j) although the 
optimal value depends on boundary conditions. 

Lemma 8. (yoFA(t)) t l^a +1 in is an optimal solution 
/orEPf s - J (^,^), despite any boundary conditions (y.,-,yj). 



We first use this lemma to prove Theorem [9] and then we 
prove this lemma. Suppose (y* (t)) t= i is an optimal so- 
lution for EPi. For completeness, we let y*(0) = and 
y*(T + 1) = 0. We define a sequence (j/o(*))£=i, (j/i(*))t=i, 
(y h+1 (t))f =1 as follows: 

1. »>(*) = »*(*) far all t€ [1,T]. 

2. For all t G [1, T] and j = 1, k 

\y (t), otherwise 

3. y k +x(t) = J/ofa (t) for all t g [1,T]. 

We next set the boundary conditions for each EP? g_J by 

y • = J/ofa (Tj) and y] = y* (T/ +1 + 1) (42) 
It follows that 

CEPifeO-CEp^j/j+i) = C Ep sg-j(y*)-C Ep = g -j(yoFA) (43) 

By Lemma El we obtain C EpSg -j (j/*) > C Ep = g -j (j/ofa) for 
all j. Hence, 

CEPj(2/*) = CeP^J/o) > •■• > CEP ; (yfc+l) = CeP^J/OFa) 

(44) 

This completes the proof of Theorem [9] 

Proof of Lemma [S] Consider given any boundary condi- 

tion (yj,yj) for EP; g ~ J . Suppose {y(t)) t l^c +1 is an optimal 

solution for EPf s ~ J w.r.t. (yj, yj), and j ^ J/ofa- We aim 
to show C EpSg -j(?y) > C Epag -j (j/ofa), by considering the 
types of critical segment. 

(type-1): First, suppose that critical segment [TJ+l, Tj +1 ] 
is type-1. Hence, j/ofa (t) = 1 for all t G [T/ + 1,T/ +1 ]. 
Hence, 



C EpSg - J (2/OFA)=/V(l-y^)+ £ ^(l,p(t),e«(t)) (45) 

t=TP+l 

Case 1: Suppose j?(f) = for all t G [T/+1, T/ +1 ]. Hence, 
c EPri (y) = /? 9 -y; : + E ^(0,p(*),ei(t)) (46) 

t=T?+l 

J 

We obtain: 



C„ Dag -j (j?) - C Epag -j (j/ofa) 



EPj B_J EPf 
T? 

= /3„ • yj + £ nW-^l-yJ) (47) 

t=T c + l 

> ft-yJ + flifTZ+O-iiifT/J-^l-yJ) (48) 

= /3 9 -y J r + / ? ^/ ? 9 +^ >0 (49) 

where Eqn. (|47l) follows from the definition of T"j(t) (see Eqn. 
(JTTJ ) and Eqn. (|480 follows from Lemma [9] This completes 
the proof for Case 1. 

(Case 2): Suppose y(t) = 1 for some t G [T/ + 1,T/ +1 ]. 
This implies that C EpBg -j (j?) has to involve the startup cost 

Next, we denote the minimal set of segments within + 
l,77+i] by 

r b ei r b e-\ r b ei r b en 

[n , Ti J, [t 2 , T 2 J, [t 3 , t 3 J, [r p , r p J 



such that y(t) / j/ofa (t) for all t G [-r, 6 ,rf], / G {l,...,p}, 
where rf < r i+1 . 

Since j? 7^ J/ofa, then there exists at least one t G [T/ + 
1,3T +1 ] such that y(t) = 0. Hence, is well-defined. 

Note that upon exiting each segment [tj , rf], j? switches 
from to 1. Hence, it incurs the startup cost j3 g . However, 
when Tp = Tj +1 and yj = 0, the startup cost is not for 
critical segment [Tf + 1, 3J +1 ]. 

Therefore, we obtain: 



C EpSg -j (j?) - C Ep3g -j (j/ofa) 

E^W + /3 9 -i[riV^ c + i] 



p-i . T i 



+ E(E r *W + ^ 



'=2 t=rf 



(50) 
(51) 

(52) 



+ E *(t) + /3 9 y • ■ l[r; = T/ +1 ] + ft, ■ l[r p e / 3^3J 



Now we prove the terms (|51[) (|52|l and (|53|l are all no less 
than 0. 

First, if t\ = T/ + 1, then 



^nW + ft.iInVTZ + i] = E »■*(*) 

+ £ = T C + 1 

1 J 

> Ri(r*) — Ri(Tj) 

> ik(Tt) + Aj >0. 

else then 

f r^+ft-lIrf/^ + l] = 

> fli(Ti e )+j8 9 >0. 



Thus, we proved (|5ip > 0. 
Second, 



E > Ri(Tr)-Ri(rf-i)+0 B 

> Bi(7f)+j9 s >0. 

Thus, we proved (JS2I> 0. 
Last, if Tp = > tn en 

E + ■ ^ = + ^ ■ ^ ^+1] 



> E ri W ^ MtUi) - M-Tp - 1) 



= -Ri(Tj ;- 1) > 0. 



else then 



Define the sub-cost for type-h by 



= I] r 4 (t) + ft, > i?,(r p e ) - ^(r p 6 - 1) + /3 g 
> 0. 



Thus, we proved ([53jl> 0. 
So we obtain 

C EP = E -j (t/) - C Ep s g -j (j/ofa) > 0. 

(type-2): Next, suppose that critical segment [TJ+l, Tj +1 ] 
is type-2. Hence, j/ofa(*) = for all t G [T/ + 1, Tf +1 ]. Note 
that the above argument applies similarly to type-2 setting, 
when we consider (Case 1): y(t) = 1 for all t G [Tj + 1, Tj +1 ] 
and (Case 2): y(t) = for some t G [T/ + 1,T/ +1 ]. 

(type-start and type-end): We note that the argument 
of type-2 applies similarly to type-start and type-end set- 
tings. 

Therefore, we complete the proof by showing C EpSg -j (y) > 
C EP =g-j (z/ofa) for all j G [0, k\. 

Lemma 9. Suppose ti,t 2 G [Tj + l,Tj +1 ] and n < r 2 . 
Then, 

l(2) A } \>YZ T1+ Mt), if [T? + l,T? +1 ] is type-2 

(54) 



Proof. We recall that 

Ri(t) ±Mn{o,max{-p g ,Ri(t-l)+ri(t)}\ (55) 

First, we consider [Tj + 1, T/+i] as type-1. This implies that 
only i?i(T/) = -f3 g , whereas R l {t) > -f3 g for t G [TJ + 
Hence, 

Jii(t) = min{0, Ri(t- 1) + < J^(< - 1) + n(t) (56) 

Iteratively, we obtain 

T2 



iiifa) < R i (T 1 )+ E 

t=Tl + l 



(57) 



When [Tj 1 + ljTj+i] is type-2, we proceed with a similar 
proof, except 

Ri(t) =max{-P B ,Ri(t-l)+ri(t)} > Ri(t-l) + n{t) (58) 
Therefore, 

T2 



Ri(r 2 ) > Ri(n)+ E 

t=TL + l 



(59) 



□ 



E. PROOF OF THEOREM 7 

First, we denote the set of indexes of critical segments for 
type-/i by 7/i C {0, .., k}. Note that we also refer to type- 
start and type-end by type-0 and type-3 respectively. 



jeT h t=T c +i 

+P 3 -{y(t)-y(t-l)} + . 

Hence, C EPi (y) = Y?h=o ^EP^iv)- We P rove by comparing 
the sub-cost for each type-/i. We denote the outcome of 

CHASE< w) by (2/ C HASB(w)(*))r = r 

(type-0): Note that both yoFA(t) = 2/chase(w) (i) = for 
all t G [l,Tf]. Hence, 



CePj (2/OFa) — C Ep . (yCHASE(w))- 

(type-1): Based on the definition of critical segment (Def- 
inition [1]), we recall that there is an auxiliary point TJ, such 
that either (Ri(TJ) = and Ri(f?) = -fig) or (Ri(TJ) = 
-/3 g and Ri(T?) = 0). We focus on the segment T/ + 1 + ™ < 
T? We observe 



J/CHASE (*) 



0, for all t G [TJ + 1, — u 

1, for allt G [TJ-w,TJ +1 ]. 



We consider a particular type-1 critical segment, i.e., fc-th 
type-1 critical segment: [Tj + 1, TJ +1 \. Note that by the def- 
inition of type-1, y FA(Tj) = 2/ C hase(w) (Tj) = 0. j/ofa(*) 
switches from to 1 at time 7J+1, while ycHASE(w) switches 
at time Tj — w, both incurring startup cost j3 g . The cost 
difference between 2/chase(w) and j/ofa within [T/ + 1, TJ +1 ] 
is 



E (^(0,p(t) ) e i (t))-^(l,ff(t) ) ei(t))) + 

t=T?+l 
fj -10— 1 

E r<(t) = iii(f; - to - 1) - ^(T/) = 9fc + &>, 



where g£ = Ri(TJ -w-1). 

Recall the number of type-/i critical segments nih = \Th\- 

mi 

C Epf(2/CHASE(w)) < C^p^i/ofa) + mi ■ /J 9 + E 



(type-2) and (type-3): We derive similarly for h = 2 or 
3 as 



C|p^(2/CHASE(w)) < C^(j/ofa) - E 



h 
<lk 



m h . 



The last inequality comes from that q% > — /3 g for all h,k. 



Furthermore, we note mi = m,2 + 7713. Overall, we obtain 



On the other hand, we obtain 



CEPi(j/CHASE(w)) _ Eh=0 CEpf (VCHASE(w)) 

Cep^OFa) ELo C EP>OFa) 
m lPg + E^l gfc + ( m 2 + m 'i)Pg + ELo Cep^ (yoFA) 

ELo Cep^J/ofa) 

ELo CepIXs/ofa) 
'0 ifmi=0, 

< 1 + { 2 mi p a + il 



C^pVj/OFA) 



otherwise. 



By Lemma [TU] and simplifications, we obtain 



CEPi (2/CHASE(w)) 



< 1 + 



< 1 + 



CeP^J/OFa) 

2/3 (iPmax — — C m ) 



+ W ■ C m P max (£ 



2 p 



max <--o / 



Pmax(l + WC m //3g) ' 



(60) 



53 (^(l,p(*),ei(t)) -Cm) 

E^t/V (V» (Xp(*)> - Cm) 

E*=tp+i (^(0,p(t),e 1 (t))-V(l,p(*)»ei(*)) + Cm) 

x ^ (V>(0,p(t),e i (t))-V(l,p(*),e i (t))+c m ) 



t=T c + l 



V ) (l,p(T"),e»('r)) 



~ re[T?+i^- TO -l] V(0,p(T-),e 1 (r))-V(l,p(-r),e i (r)) + c n 
f/-w-i 

x ^ (V>(0 J p(t),e i (i))-V(l,p(t),e i (i)) + c m ) 

t=T9 + l 
C 



max "-o 
T? — 10-I 



(62) 



x ^ (V»(0,p(t),ei(t))-V(l,p(t) ) e i (t)) + c m ). 

t=T? + l 

The last inequality follows from Lemma [TT1 
Next, we bound the second term by 

(V(0,p(t),e i (t))-V(l,p(i),e i (t)) + c ro ) 

t=T c + l 



Lemma 10. 



1 1 1 j 

CEp'^yoFA) > 



+ TO • C m + 



(^+/3 g )(Lc + c m ) 

C o{-1k + W ■ Cm) \ 



max <-o 



miP, 



max VMS + WCm J 
Pmax Co 

Proof. Consider a particular type-1 segment T/ + i] 
Denote the costs of yoFA during [Tj + 1, — w — 1] and 
[f/ - w, T/ +1 ] by Cost up and Cost pt respectively. 

Step 1: We bound Cost up as follows: 



Cost up 

T9-W-1 

= p 9 + E ^(i,p(*),ci(0) 

t=T?+l 

= / 9 g + (T;-w-l-r?) Cm + (i>(l,P(t),ei(t)) 

i=T c + l 



> E M*)+< 



t=r;+i 

> i?i(f; - u; - 1) - ^(t?) + (f; - w - 1 - t; 

= ^ + /3 9 + (f/-w-l-T/)c m . 
Together, we obtain 
Cost up 

> P a + (T/ - w - 1 - T/)c m + 

9fc + A, + (T/ - w - 1 - T/)c„ 



max <-o 



^ , (g^+fe)Co + (^- m -l-T; C )P m axC m (63) 



max 



Furthermore, we note that (Tj — w — 1 — Tj) is lower 
bounded by the steepest descend when p(t) = P ma x and 
ei(t) = L, 



T- - w - 1 - T- > —r- 
1 J ~ L P, 



By Eqns. lf63 )) -l(6"4 ]l . we obtain 



Co ) Cn 



(64) 



Cost up 

(ql + p g )c + (f? - w - 1 - T/)P max c„ 



> /3 S + 



J max C 

(g£ + p g )(Lc + c m ) 



L(P 1 



(65) 



max <-o / <-m 



Step 2: We bound Cost pt as follows. 
Cost pt = tf(l,p(*),e<(*)) 

t=T9 — w 
3 

= (T/ +1 - T/ + w + l)c m + J! (V»(l.P(*),ei(t))-c m ) 

t=T?-w 
J 

> W ■ C m + 

p C °_ E (V'(0,p(t),e*(t))-^(l,p(t) ) e i (t)) + c B 

J max Co 

t=T c -™ 

3 

On the other hand, we obtain 

£ (V> (0,p(*),ei(t)) - V e * (*)) + c™ 

t=f c -w 

3 

£ n(t) + {T? +1 - f? + W + l)c m 

t=T a -w 

3 

> Ri(Tj +1 ) - Ri(fj — W — l)+W-Cm=W - Cm - Qk- 

Therefore, 

c„{w ■ Cm —ql) 



Case 2: c a < p(r). By Eqn. © and e;(r) < L,Vi,T, 
Thus, 

ip (1,p(t), ei(r)) = c e, ; (-r) + c m , 
V ) (0,p(r),e i (r)) = p(-r)e;(r). 



^(l>p(T),ei(r)) -c„ 



^(0,p(r),ei(r)) - ip (1,p(t), ei(r)) + c„ 
c D ei(r) 



p(r)ei(r) - c ej(r) 

Co 



Cost > W • Cm + 



Pmax C 



(66) 



Since there are mi type-1 critical segments, according to 
Eqns. (f65|) - (|66|) . we obtain 

Cost* 5 "" (j/OFa) 

>- -ft+E( 'g; A "^ + r' 

. ^ v V max <-o J L m 

, C (-9fe + W ■ C m ) 
+ W ■ Cm H p 

± max C 

' ilk +Pg)Co 



fe = l l/max C 0/ / 

c o(— qi + w ■ C ™) 



+W ■ Cm + 



Pixibx C-o 

mi(j3gC + 



3 ' P -r 

max <-o 

miP max (/3 g + mem) 

□ 

Lemma 11. 

^(l,p(T),ej(r)) -c m > c 

ip (0,p(r),ei(r)) - V (l,p(r), ei(r)) + c m . ~~ P max — c G ' 

Proof. We expand ip {y(r),p{r), a(r)) for each case: 
Case 1: c a > p(r). By Eqn. |[9j) and ei(r) < L, V«',r, 

<Kl,p(i"),ei(r)) = p(r)e«(r) + c m , 
V>(0,p(r),ei(r)) = p(r)ej(r). 



Therefore, 



V(l,p(*)»ei(*)) 



Therefore, 

> 

)• 

> 



Combining both cases, we complete the proof of this lemma. □ 

F. PROOF OF THEOREM 2 

First, we prove that the factor loss in optimality is at most 

LPm&x/ {Lc + Cm)- 

Then, we prove that the factor loss is tight. 

Let (x, y) be the solution obtained by solving CP and EP 
separately in sequence and (a:*, y*) be the solution obtained 
by solving the joint-optimization DCM. Denote Cdcm(s, y) 
to be cost of DCM of solution (x, y) and Ccp(:e) to be cost 
of CP of solution x. 

It is straightforward that 

CucM(x,y) < Cocm(x, 0). (67) 
Because Cdcm(:e,0) = Ccp(sb), we have 
Cdcm(£c,0) = Ccp(S) < C C p(a3*) = Cdcm^*, 0). (68) 
By Eqns. (|57| and (JBSJ), we obtain 



C D cM(a;,y) < Cdcm(:e*,0) 



(69) 



Cdcm(x*,j/*) Cdcm(£c*, y*) 
Then, according to the following lemma, we get 

_ CdCm(^, §) < J/fmax 

Cdcm(x* ,y*) ~ Lc + Cm 

Lemma 12. CdcmCsc*, 0)/Cdcm(:e*, y*) < LP max / (Lc a + . 

Proof. By plugging solutions (x*,0) and (x*,y*) into 
DCM separately, we have 

T 

C DC m(z*,0) = £{p(t)dt (**(*)) 
t=i 

+p s [x*(t)-x*(t-l)] + } (70) 



and 



C D CM(a3*,y*) = £{V(W*(*),P(*), *(**(*))) 
+j3 3 [x*(t)-x*(t-l)] + 

+My*(t)~y*(t-i)] + } 



> 



ip (0,p(t),ei(t)) - V (l,p(t), et{t)) + Cn 



J2»(y*(t),p(t),d t (x*(t))) 

t=i 

+0 s [x*(t)-x*(t -!)]+}. (71) 



By Eqns. (JTOjl, ([71]) and we obtain 

Cdcm(j:', 0) 
C D CM(a3*,i/*) 

< ELip(*)^ (**(*)) 
" £LiiKi/*(*).p(*). *(**(*))) 

p(t)d t (x*(t)) 



< max 



te{i,..,T} tl)(y*(t),p(t),d t (x*(t))) 
1, if < c , 

c B J t ( g »(0)+ em | d t (^(t))/L |. otherwise 

Pmaxdt (X*(t)) 

Co d t (x*(t)) + c m di (x*(t))/L 



□ 



Next, we prove that the factor loss is tight. 

Lemma 13. There exist an input such that 
CocM(x,y)/CDCM(x* ,y*) = LP max / (Lc + c m ) ■ 

Proof. Consider the following input: 

d t (x(t)) = e m x(t), p(t) = P max , Vt, 

and 

if *=! + *(! + — §£-), fcGN , 



a(t) 



0, otherwise, 



where e m > is a constant such that L/e m is an integer. 

Then for the above input, according to algorithm [3] it is 
easy to see that 

5i(t) = /£' ift = 1 + fc ( 1 + ^fc)' fcGN °> 
1 0, otherwise. 

Besides, according to algorithm [3] the following x* must 
be an optimal solution whatever y* is. 

x*(t) = — , Wt. 

Cm 

Without loss of generality, consider the following param- 
eter setting: 

<P S , 

ft 



^(-fmax C ) C m Cm "C 0, 

Cm-i max 



and 



Cm Pi 



p 3 p s L 

Cm H ^ (Pmax - C J > 0. 



m J max 



6m Pi 



m J max 



Since x and a;* have been determined by us, we can apply 
Theorem|9]to obtain the corresponding y and y* . According 
to Eqn. and the above parameter setting, given x and 
a, the corresponding Ri(t) never reaches 0. However, given 
x* and a, the corresponding Ri(t) will soon reach and 
never fall back to —j3 g . So we have 



and 



m = o, vt 



y*(t) = 1, -it. 



e P 



x\t) 

m 

y(t) 



Figure 12: Example of a(t), x(t), x*(t), y(t) and y*(t). 

See Fig. [12] as an example. By plugging the above (x, y) 
and {x* ,y*) into DCM, we have 



PPmax + s L/e r , 



Cdcm(s, y) 

Cdcm(:e*, y*) Lc + c m + (Lc a + c m )/3 s /(e 

PPmax[l + /? s /( 

(Lc„ + cm) [l + p s /( 



Lc + C 71 



□ 



Theorem [2] follows from Eqn. (J69jl, lemmas [T2l and [T3l 

G. PROOF OF THEOREM 8 

Let (x, y) be an offline optimal solution obtained by solv- 
ing CP and EP separately in sequence and (x*,y*) be 
an offline optimal solution obtained by solving the joint- 
optimization DCM. Let x on be the solution obtained by 
GCSR (w) and y off be an offline optimal solution of EP 
given input x° n . Let (x on ,y on ) be the solution obtained 
by DCMON (w) . Denote Cdcm (x,y) to be cost of DCM 
of solution (x, y) and Ccp(:e) to be cost of CP of solution 
x. 

According to Theorem [Jj equation (|60[) and the fact that 
the available look-ahead window size is only [w — A s ] + for 
DCMON (w) to solve EP (discussed in Sec. we have 

Cdcm (x on , y on ) 



< 1 



< 1 



< 1 



Cdcm (x on , y° ff ) 

^fig (^^max Lc On 



ftgLPmayi ~\ [w — A s ]~^ C m _P ma x {L ■ 



2(LP n 



Lc 



LPmsLX + ftg-Pmax ~ p — Cm _ c ^ 



. Pmax Co 



(72) 



where A s = /3 s /(d m i n P m in) and a g = ^ [w - A s ] + is a 
''normalized" look-ahead window size that takes values in 
[0,+oo). 

According to Theorem [2] we have 



LP„ 



CucM{x,y) 
Cdcm {x*,y*) ~ Lc D + c„ 



(73) 



Then if we can bound CDCM(ai or \ y°^)/CDCM(a), y), we 
obtain the competitive ratio upper bound of DCMON' w ' . 
The following lemma gives us such a bound. 



Lemma 14. Cbcm(x oti , y off )/Cbcm(x, y) <2-a s , where 
a 3 = min(l, to/ A*) and A 3 = p s /(d min P nlin ) . 

Proof. It is straightforward that 

C D CM(a3 OI \ y°") < C DC M(x on , y). (74) 

So we seeks to bound Cdcm(s°", y)/Cucm{x, y). 
For solution x on and x, denote 



CW(x)=p s ^[x(t)-x(t-l)Y 



(75) 



and 



CI(x on , x) = £p(t) (dt(x° n (t)) - d t (x(t))) . (76) 



(77) 



According to Eqn. (|36[) , we have 

CW{xT) = CW(xi). 



According to lemma [TBI and the fact that x° n (t),Xi(t) £ 
{0, 1}, Vt, i, we have 



CW(x on ) = CW(^xD = ^2cw(xt n ) 

i=l i=l 
U U 

= ^CW{xi) = CW(^Xi) 

i=i i=i 
= CW(x), 



(78) 



and 



t=i 

T x(t) \ 

= EfW E ^-E d 

t=i y i=i i=i / 

M T 



= EZ)p(*)4(xT(*)-2i(t)) 

i=l t = l 

A/ 

< (1 - a 3 ) CW(xi) 

i=i 

= (l-a e )CW(x), 



dSOj), 

C D CM(a;°",y) 

= X){V(t?(t),P(*),dt(x°"(t))) 
t=i 

+J 8 s [x°"(t) - - 1)]+ + ft[y(t) - »(t - 1)] + } 

T 

< J2 W (»(*)> P(*), * (*(*))) + P(*) (* (*""(*)) - * («(*))) 
t=i 

+&[*""(*) - * on (* - + & - if(* - 1)] + } 

T 

= E (y(t),p(t),d t (*(*))) + - j?(t - 1)] + } 
*=i 

+CW(x „) + CI( 3 ; on ,ai). (81) 

Then, by Eqns. (JTSJ), J7SJ| and ([ST]), we have 

Cdcm(:e ot1 , j/) 
Cdcm (x,y) 

< Ef=i ± (y(t),p(t), dt (x(t))) + CI(x m , x) + CW^on) 

Ef=i i> (y(t),p(t),dt (z(<))) + cw(s) 

(1 - aQCW(ic) + CW(a: orl ) 

(l-a a )Cty(x)+CW(a) 
CW(S) 

= 2-q s . (82) 
This lemma follows from Eqns. (fTI)) and (JSJJl- □ 

Theorem [8] follows from Eqns. l|72]l. (f73|) and lemma [Til 

Lemma 15. Si, »2, . . . xm and x° n , x%"~ , . . . x^ are de- 
creasing sequences, i.e., Vt, ii(i) > ... > im(() andx°"(t) > 

... > iS?(t)- 

Proof. Recall that Xi and a;°" are offline and online 
solutions obtained by CPOFF s and GCSRi w) for prob- 
lem CPi, respectively. According to the definition of CPi, 
ai(t) > a,2(t) > ... > aA/(t) is a decreasing sequence and 
d] < d% < ... < d^ 1 is an increasing sequence. Thus, for 
problem CPi, the larger the index i is, the more sparse work- 
load tends to be and the higher power consumption tends 
to be. Hence, for a larger index i, there are more "idling in- 
tervals", meanwhile both CPOFF s and GCSRi w) tends to 
keep servers idling less during idling intervals (because idling 
cost is higher). So, xi, X2, ■ ■ ■ xm and x° n , x™, . . . x^J} are 
decreasing sequences, i.e., Vt, xi(t) > ... > xu(t) and x° n (t) 
> - > x%?(t). □ 



(79) 



where the last and second last inequalities come from Eqns. 
(HE) and (J39J) , respectively. 

According to Eqn. ©, we have V6 G [0, x on (t)], 

^(y(t),P(t),dt (x on (t)))-^(y(t),P(t),dt (6)) 
< p(t)(d t (x° n (t))-d t (b)). (80) 



By the definition of DCM, Eqns. (J3TJ>, d75j) , ([76]) and 



